Main Questions
Our main question:
- How have global CO2 emission rates changed over time? In particular for the US, and how does the US compare to other countries?
- Are US CO2 emissions, global temperatures, and US storm rates associated?
Disclaimer: The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts.
This case study explores how different countries have contributed to Carbon Dioxide (CO2) emissions over time and how CO2 emission rates may relate to increasing global temperatures and increased rates of natural disasters and storms. This report provides a basis for the motivation: https://www.epa.gov/report-environment/greenhouse-gases.
CO2 makes up the largest proportion of greenhouse gas emissions in the United States:
A variety of sources and sectors contribute to greenhouse gas emissions, with transportation contributing the most metric tons of CO2:
So why should we pay attention to greenhouse gases?
According to the US Environmental Protection Agency (EPA) Inventory of U.S. Greenhouse Gas Emissions and Sinks 2020 Report:
Greenhouse gases absorb infrared radiation, thereby trapping heat in the atmosphere and making the planet warmer. The most important greenhouse gases directly emitted by humans include carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O), and several fluorine-containing halogenated substances. Although CO2, CH4, and N2O occur naturally in the atmosphere, human activities have changed their atmospheric concentrations. From the pre- industrial era (i.e., ending about 1750) to 2018, concentrations of these greenhouse gases have increased globally by 46, 165, and 23 percent, respectively (IPCC 2013; NOAA/ESRL 2019a, 2019b, 2019c).
There are many signs that our planet is experiencing warmer temperatures:
The connection between greenhouse gas levels and global temperatures and the influence of increased global temperatures on human health are motivated by these reports:
Melillo, J.M., T.C. Richmond, and G.W. Yohe (eds.). 2014. Climate change impacts in the United States: The third National Climate Assessment. U.S. Global Change Research Program.
The National Climate Assessment Report states that:
Heat-trapping gases already in the atmosphere have committed us to a hotter future with more climate-related impacts over the next few decades. The magnitude of climate change beyond the next few decades depends primarily on the amount of heat-trapping gases that human activities emit globally, now and in the future.
Our main question:
In this case study, we will explore CO2 emission data from around the world. We will also focus on the US specifically to evaluate patterns of temperatures and storm activity. This case study will particularly focus on how to use different datasets that span different ranges of time, as well as how to create visualizations of patterns over time. We will especially focus on using packages and functions from the Tidyverse, such as dplyr, tidyr, ggplot2and gganimate. The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R especially efficient.
We will begin by loading the packages that we will need:
library(here)
library(readxl)
library(readr)
library(dplyr)
library(magrittr)
library(tidyverse)
#library(plotly)
library(ggplot2)
library(gganimate)
library(directlabels)
library(ggrepel)
library(RColorBrewer)
library(patchwork)| Package | Use |
|---|---|
| here | to easily load and save data |
| readxl | to import the excel file data |
| readr | to import the csv file data |
| dplyr | to view and wrangle the data |
| magrittr | to use and reassign data objects using the %<>%pipe operator |
| tidyverse | to wrangle the data and create ggplot2 plots |
| ####plotyly | to make the visualizations |
| ggplot2 | to make visualizations |
| directlabels | to add labels to plots easily |
| ggrepel | to add labels that don’t overlap to plots |
| gganimate | to make the plots interactive |
| RColorBrewer | to have greater control over the color in our plots |
| patchwork | to combine plots |
The first time we use a function, we will use the :: to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.
Greenhouse gas emissions are due to both natural processes and anthropogenic (human-derived) activities.
These emissions are one of the contributing factors to rising global temperatures, which can have a great influence on public health as illustrated in the following image:
Gases in the atmosphere can contribute to climate change both directly and indirectly. Direct effects occur when the gas itself absorbs radiation. Indirect radiative forcing occurs when chemical transformations of the substance produce other greenhouse gases, when a gas influences the atmospheric lifetimes of other gases, and/or when a gas affects atmospheric processes that alter the radiative balance of the earth (e.g., affect cloud formation or albedo). The IPCC developed the Global Warming Potential (GWP) concept to compare the ability of a greenhouse gas to trap heat in the atmosphere relative to another gas. The GWP of a greenhouse gas is defined as the ratio of the accumulated radiative forcing within a specific time horizon caused by emitting 1 kilogram of the gas, relative to that of the reference gas CO2 (IPCC 2013). Therefore GWP-weighted emissions are provided in million metric tons of CO2 equivalent (MMT CO2 Eq.)
CO2 is actually the least capable of the greenhouse gases for trapping heat:
However, because CO2 is so much more abundant and stays in the atmosphere so much longer than other greenhouse gases, it has been the largest contributor to global warming.
See here for more details.
Furthermore, sizing CO2 levels also influence ocean acidity:
This makes it difficult for organisms to maintain their shells or skeletons that are made of calcium carbonate, thus making it more difficult for these organisms to survive and impacting their role in the ecosystem and food chain.
Furthermore, greenhouse gas emissions are believed to influence storm rates.
Indeed events with high levels of precipitation which can induce flooding and property damage are generally increasing around the country:
There are some important considerations regarding this data analysis to keep in mind:
The datasets included only include countries and years in which countries were reporting such information to the agencies that collected the data. Thus the data is incomplete. For example while we have a fairly good sense of CO2 emissions globally for later years, additional emissions were also produced by countries that are not included in the data.
Correlation or association does not imply causation. We will be showing how different datasets show similar trends across time. This does not imply that one caused the other. However, in the case of some of the data we will show, there is additional scientific evidence to suggest that for example, increased CO2 emissions may cause increased temperatures or increased rates of disastors. However, simply showing a similar trend over time does not in itself prove that two variables are causally related. As you can see from this plot, often data may show a similar pattern over time by random chance. See this website for more examples.
In this case study we will be using data related to CO2 emissions, as well as other data that may influence, be influenced or relate to CO2 emissions. Most of our data was obtained from Gapminder, which is a unique nonprofit that provides a variety of data for free.
In their words, Gapminder is…
Gapminder is an independent Swedish foundation with no political, religious or economic affiliations. Gapminder is a fact tank, not a think tank. Gapminder fights devastating misconceptions about global development. Gapminder produces free teaching resources making the world understandable based on reliable statistics. Gapminder promotes a fact-based worldview everyone can understand. Gapminder collaborates with universities, UN, public agencies and non-governmental organizations. All Gapminder activities are governed by the board. We do not award grants. Gapminder Foundation is registered at Stockholm County Administration Board. Our constitution can be found here.
The data that we will be using from Gapminder was obtained from the World Bank.
In addition we will use some data that is specific to the United States from the [National Oceanic and Atmospheric Administration (NOAA)] (https://www.noaa.gov/), which is an agency that collects weather and climate data.
| Data | Time span | Source | Orginal Source | Description | Citation |
|---|---|---|---|---|---|
| CO2 emissions | 1751 to 2014 | Gapminder | Carbon Dioxid Information Analysis Center (CDIAC) | CO2 emissions in tonnes or metric tons (equivalent to approximately 2,204.6 pounds) per person by country | NA |
| GDP per capita, percent yearly growth | 1801 to 2019 | Gapminder | World Bank | Growth Domestic Product (which is an overall measure of the health of nation’s economy) per person by country | NA |
| Energy use per person | 1960 to 2015 | Gapminder | World Bank | Use of primary energy before transformation to other end-use fules, by country | NA |
| Crude Mortality Rate | 1960 to 2018 | World Bank | World Bank | Death rate per 1,000 people by country | NA |
| US Natural Disasters | 1980 to 2019 | The National Oceanic and Atmospheric Administration (NOAA) | The National Oceanic and Atmospheric Administration (NOAA) | US data about: – Droughts – Floods – Freezes – Severe Storms – Tropical Cyclones – Wildfires – Winter Storms |
NOAA National Centers for Environmental Information (NCEI) U.S. Billion-Dollar Weather and Climate Disasters (2020). https://www.ncdc.noaa.gov/billions/, DOI: 10.25921/stkw-7w73 |
| Temperature | 1895 to 2019 | The National Oceanic and Atmospheric Administration (NOAA) | The National Oceanic and Atmospheric Administration (NOAA) | US National yearly average temperature (in Fahrenheit) from 1895 to 2019 | NOAA National Centers for Environmental information, Climate at a Glance: National Time Series, published June 2020, retrieved on June 26, 2020 from https://www.ncdc.noaa.gov/cag/ |
To obtain the temperature data, annual average temperatures were selected as shown in this image:
Importantly, notice that the data we would like to use span different time periods:
| Data | Time span |
|---|---|
| CO2 emissions | 1751 to 2014 |
| GDP per capita, yearly growth | 1801 to 2019 |
| Energy use per person | 1960 to 2015 |
| Crude Mortality Rate | 1960 to 2018 |
| US Natural Disasters | 1980 to 2019 |
| Temperature | 1895 to 2019 |
To read in the files that were downloaded from the various sources as indicated in the table above, we will use the read_xlsx() and read_xls() functions of the readxl package to import the data from the .xlsx and .xls files respectively and we will use the read_csv function of the readr package to import the data from the csv files.
# xlsx files:
CO2_emissions <- readxl::read_xlsx(here("docs/yearly_co2_emissions_1000_tonnes.xlsx"))
gdp_growth <- readxl::read_xlsx(here("docs/gdp_per_capita_yearly_growth.xlsx"))
energy_use <- readxl::read_xlsx(here("docs/energy_use_per_person.xlsx"))
# xls file:
mortality <- readxl::read_xls(here("docs/API_SP.DYN.CDRT.IN_DS2_en_excel_v2_804384.xls"))For our csv data files, there are some lines that we would like to not import - infact, we will get an error if we try to import them because our table structure will be as r expects. We can do so using the skip = argument of the read_csv() function.
Here you can see that the first two rows of the data about US Disasters doesn’t have the same number of columns as the subsequent rows. So we want to skip these first two lines, we will use skip = 2 for this.
Now looking at the temperature data, we can see that the first four lines do not have the same number of columns as the subsequent lines. We will skip importing all 4 lines by using
skip = 4. We can also specify that NA values are encoded as "-99". This will replace all instances of "-99" with NA. We can do this using the na = argument of the read_csv() function. We will do so as: na = "-99". The “-99” needs to be in quotation markes becuase this argument expects characters.
#csv files:
us_disaster <- readr::read_csv(here("docs/time-series-US.csv"), skip = 2)
us_temperature <- readr::read_csv(here("docs/temperature.csv"), skip = 4, na ="-99")
mortality2 <-readr::read_csv(here("docs/mortality.csv"), skip = 5)Great! now we have imported all of the data that we will need.
Now we will take a look at our data and wrangle it until it is easy to use to allow us to evaluate how CO2 emissions have changed over time and how emissions may relate to energy use, mortality, GDP etc.
First let’s take a look at the CO2 data. We can use the base slice_head() function of the dplyr package to see just the first rows of our data. We can specify how many rows we would like to see by using the n = argument. It is also useful to use the slice_sample() function to look at a selection of random rows.
We will use the %>% pipe which can be used to define the input for later sequential steps. This will make more sense when we have multiple sequential steps using the same data object. To use the pipe notation we need to install and load the dplyr package.
# A tibble: 6 x 265
country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghan… NA NA NA NA NA NA NA NA NA NA
2 Albania NA NA NA NA NA NA NA NA NA NA
3 Algeria NA NA NA NA NA NA NA NA NA NA
4 Andorra NA NA NA NA NA NA NA NA NA NA
5 Angola NA NA NA NA NA NA NA NA NA NA
6 Antigu… NA NA NA NA NA NA NA NA NA NA
# … with 254 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>,
# `1764` <dbl>, `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>,
# `1769` <dbl>, `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>,
# `1774` <dbl>, `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>,
# `1779` <dbl>, `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>,
# `1784` <dbl>, `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>,
# `1789` <dbl>, `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>,
# `1794` <dbl>, `1795` <dbl>, `1796` <dbl>, `1797` <dbl>, `1798` <dbl>,
# `1799` <dbl>, `1800` <dbl>, `1801` <dbl>, `1802` <dbl>, `1803` <dbl>,
# `1804` <dbl>, `1805` <dbl>, `1806` <dbl>, `1807` <dbl>, `1808` <dbl>,
# `1809` <dbl>, `1810` <dbl>, `1811` <dbl>, `1812` <dbl>, `1813` <dbl>,
# `1814` <dbl>, `1815` <dbl>, `1816` <dbl>, `1817` <dbl>, `1818` <dbl>,
# `1819` <dbl>, `1820` <dbl>, `1821` <dbl>, `1822` <dbl>, `1823` <dbl>,
# `1824` <dbl>, `1825` <dbl>, `1826` <dbl>, `1827` <dbl>, `1828` <dbl>,
# `1829` <dbl>, `1830` <dbl>, `1831` <dbl>, `1832` <dbl>, `1833` <dbl>,
# `1834` <dbl>, `1835` <dbl>, `1836` <dbl>, `1837` <dbl>, `1838` <dbl>,
# `1839` <dbl>, `1840` <dbl>, `1841` <dbl>, `1842` <dbl>, `1843` <dbl>,
# `1844` <dbl>, `1845` <dbl>, `1846` <dbl>, `1847` <dbl>, `1848` <dbl>,
# `1849` <dbl>, `1850` <dbl>, `1851` <dbl>, `1852` <dbl>, `1853` <dbl>,
# `1854` <dbl>, `1855` <dbl>, `1856` <dbl>, `1857` <dbl>, `1858` <dbl>,
# `1859` <dbl>, `1860` <dbl>, …
# A tibble: 10 x 265
country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Thaila… NA NA NA NA NA NA NA NA NA NA
2 Cote d… NA NA NA NA NA NA NA NA NA NA
3 Sao To… NA NA NA NA NA NA NA NA NA NA
4 Grenada NA NA NA NA NA NA NA NA NA NA
5 Zimbab… NA NA NA NA NA NA NA NA NA NA
6 Mozamb… NA NA NA NA NA NA NA NA NA NA
7 Somalia NA NA NA NA NA NA NA NA NA NA
8 Libya NA NA NA NA NA NA NA NA NA NA
9 Poland NA NA NA NA NA NA NA NA NA NA
10 Belgium NA NA NA NA NA NA NA NA NA NA
# … with 254 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>,
# `1764` <dbl>, `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>,
# `1769` <dbl>, `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>,
# `1774` <dbl>, `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>,
# `1779` <dbl>, `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>,
# `1784` <dbl>, `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>,
# `1789` <dbl>, `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>,
# `1794` <dbl>, `1795` <dbl>, `1796` <dbl>, `1797` <dbl>, `1798` <dbl>,
# `1799` <dbl>, `1800` <dbl>, `1801` <dbl>, `1802` <dbl>, `1803` <dbl>,
# `1804` <dbl>, `1805` <dbl>, `1806` <dbl>, `1807` <dbl>, `1808` <dbl>,
# `1809` <dbl>, `1810` <dbl>, `1811` <dbl>, `1812` <dbl>, `1813` <dbl>,
# `1814` <dbl>, `1815` <dbl>, `1816` <dbl>, `1817` <dbl>, `1818` <dbl>,
# `1819` <dbl>, `1820` <dbl>, `1821` <dbl>, `1822` <dbl>, `1823` <dbl>,
# `1824` <dbl>, `1825` <dbl>, `1826` <dbl>, `1827` <dbl>, `1828` <dbl>,
# `1829` <dbl>, `1830` <dbl>, `1831` <dbl>, `1832` <dbl>, `1833` <dbl>,
# `1834` <dbl>, `1835` <dbl>, `1836` <dbl>, `1837` <dbl>, `1838` <dbl>,
# `1839` <dbl>, `1840` <dbl>, `1841` <dbl>, `1842` <dbl>, `1843` <dbl>,
# `1844` <dbl>, `1845` <dbl>, `1846` <dbl>, `1847` <dbl>, `1848` <dbl>,
# `1849` <dbl>, `1850` <dbl>, `1851` <dbl>, `1852` <dbl>, `1853` <dbl>,
# `1854` <dbl>, `1855` <dbl>, `1856` <dbl>, `1857` <dbl>, `1858` <dbl>,
# `1859` <dbl>, `1860` <dbl>, …
OK, we can see that our country data makes of the rows and the yearly data makes up the columns. We also see that we have alot of NA values.
We can also use the glimpse() function of the dplyr packge to view our data. This allows us to see more of our data at once. We will see a tiny bit of each variable/column. To do so our data will be displayed with the column names listed on the right.
Rows: 192
Columns: 265
$ country <chr> "Afghanistan", "Albania", "Algeria", "Andorra", "Angola", "An…
$ `1751` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1752` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1753` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1754` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1755` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1756` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1757` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1758` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1759` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1760` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1761` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1762` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1763` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1764` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1765` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1766` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1767` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1768` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1769` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1770` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1771` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1772` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1773` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1774` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1775` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1776` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1777` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1778` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1779` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1780` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1781` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1782` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1783` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1784` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1785` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1786` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1787` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1788` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1789` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1790` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1791` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1792` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1793` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1794` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1795` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1796` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1797` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1798` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1799` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1800` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1801` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1802` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1803` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1804` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1805` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1806` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1807` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 169, NA, NA, NA, NA, NA, …
$ `1808` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1809` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1810` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1811` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1812` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1813` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1814` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1815` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1816` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1817` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1818` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `1819` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 253, NA, NA, NA, NA, NA, …
$ `1820` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 334, NA, NA, NA, NA, NA, …
$ `1821` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 359, NA, NA, NA, NA, NA, …
$ `1822` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 367, NA, NA, NA, NA, NA, …
$ `1823` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 348, NA, NA, NA, NA, NA, …
$ `1824` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA, …
$ `1825` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 403, NA, NA, NA, NA, NA, …
$ `1826` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 458, NA, NA, NA, NA, NA, …
$ `1827` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 477, NA, NA, NA, NA, NA, …
$ `1828` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 458, NA, NA, NA, NA, NA, …
$ `1829` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 477, NA, NA, NA, NA, NA, …
$ `1830` <dbl> NA, NA, NA, NA, NA, NA, NA, 0.032, NA, 495.000, 0.308, NA, NA…
$ `1831` <dbl> NA, NA, NA, NA, NA, NA, NA, 3.84e-02, NA, 4.80e+02, 3.70e-01,…
$ `1832` <dbl> NA, NA, NA, NA, NA, NA, NA, 2.56e-02, NA, 5.13e+02, 2.47e-01,…
$ `1833` <dbl> NA, NA, NA, NA, NA, NA, NA, 0.032, NA, 429.000, 0.308, NA, NA…
$ `1834` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 587, NA, NA, NA, NA, NA, …
$ `1835` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 634, NA, NA, NA, NA, NA, …
$ `1836` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 675, NA, NA, NA, NA, NA, …
$ `1837` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 708, NA, NA, NA, NA, NA, …
$ `1838` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 851, NA, NA, NA, NA, NA, …
$ `1839` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1060, NA, NA, NA, NA, NA,…
$ `1840` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1170, NA, NA, NA, NA, NA,…
$ `1841` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1320, NA, NA, NA, NA, NA,…
$ `1842` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1460, NA, NA, NA, NA, NA,…
$ `1843` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1270, NA, NA, NA, NA, NA,…
$ `1844` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1600, NA, NA, NA, NA, NA,…
$ `1845` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1800, NA, NA, NA, NA, NA,…
$ `1846` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2120, NA, NA, NA, NA, NA,…
$ `1847` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2080, NA, NA, NA, NA, NA,…
$ `1848` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2340, NA, NA, NA, NA, NA,…
$ `1849` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2260, NA, NA, NA, NA, NA,…
$ `1850` <dbl> NA, NA, NA, NA, NA, NA, NA, 0.198, NA, 2330.000, 1.910, NA, N…
$ `1851` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2340, NA, NA, NA, NA, NA,…
$ `1852` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2810, NA, NA, NA, NA, NA,…
$ `1853` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 3230, NA, NA, NA, NA, NA,…
$ `1854` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 3180, NA, NA, NA, NA, NA,…
$ `1855` <dbl> NA, NA, NA, NA, NA, NA, NA, 6.01e-01, NA, 3.70e+03, 5.80e+00,…
$ `1856` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 4240, NA, NA, NA, NA, NA,…
$ `1857` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 4880, NA, NA, NA, NA, NA,…
$ `1858` <dbl> NA, NA, NA, NA, NA, NA, NA, 8.44e-01, NA, 7.25e+03, 8.14e+00,…
$ `1859` <dbl> NA, NA, NA, NA, NA, NA, NA, 8.95e-01, NA, 5.87e+03, 8.64e+00,…
$ `1860` <dbl> NA, NA, NA, NA, NA, NA, NA, 1.18, 279.00, 6150.00, 11.40, NA,…
$ `1861` <dbl> NA, NA, NA, NA, NA, NA, NA, 1.5, 510.0, 6380.0, 14.5, NA, NA,…
$ `1862` <dbl> NA, NA, NA, NA, NA, NA, NA, 1.36, 356.00, 6360.00, 13.10, NA,…
$ `1863` <dbl> NA, NA, NA, NA, NA, NA, NA, 1.42, 400.00, 5880.00, 13.70, NA,…
$ `1864` <dbl> NA, NA, NA, NA, NA, NA, NA, 1.59, 268.00, 5080.00, 15.40, NA,…
$ `1865` <dbl> NA, NA, NA, NA, NA, NA, NA, 1.52, 422.00, 5360.00, 14.70, NA,…
$ `1866` <dbl> NA, NA, NA, NA, NA, NA, NA, 4.81, 697.00, 3600.00, 46.40, NA,…
$ `1867` <dbl> NA, NA, NA, NA, NA, NA, NA, 5.52, 895.00, 4920.00, 53.20, NA,…
$ `1868` <dbl> NA, NA, NA, NA, NA, NA, NA, 4.59, 733.00, 6080.00, 44.30, NA,…
$ `1869` <dbl> NA, NA, NA, NA, NA, NA, NA, 6.23, 642.00, 6490.00, 60.10, NA,…
$ `1870` <dbl> NA, NA, NA, NA, NA, NA, NA, 6.76, 601.00, 7370.00, 65.20, NA,…
$ `1871` <dbl> NA, NA, NA, NA, NA, NA, NA, 9.12, 693.00, 10200.00, 88.00, NA…
$ `1872` <dbl> NA, NA, NA, NA, NA, NA, NA, 9.36, 708.00, 10000.00, 90.40, NA…
$ `1873` <dbl> NA, NA, NA, NA, NA, NA, NA, 8.79, 869.00, 10700.00, 84.80, NA…
$ `1874` <dbl> NA, NA, NA, NA, NA, NA, NA, 10.7, 891.0, 9160.0, 103.0, NA, N…
$ `1875` <dbl> NA, NA, NA, NA, NA, NA, NA, 12.3, 829.0, 7870.0, 119.0, NA, N…
$ `1876` <dbl> NA, NA, NA, NA, NA, NA, NA, 15.2, 931.0, 8100.0, 147.0, NA, N…
$ `1877` <dbl> NA, NA, NA, NA, NA, NA, NA, 15.6, 1070.0, 7290.0, 150.0, NA, …
$ `1878` <dbl> NA, NA, NA, NA, NA, NA, NA, 20.3, 968.0, 7250.0, 196.0, NA, N…
$ `1879` <dbl> NA, NA, NA, NA, NA, NA, NA, 20.9, 1460.0, 8870.0, 201.0, NA, …
$ `1880` <dbl> NA, NA, NA, NA, NA, NA, NA, 24.5, 2210.0, 23700.0, 236.0, NA,…
$ `1881` <dbl> NA, NA, NA, NA, NA, NA, NA, 25.80, 1770.00, 10300.00, 249.00,…
$ `1882` <dbl> NA, NA, NA, NA, NA, NA, NA, 27.20, 2010.00, 10600.00, 262.00,…
$ `1883` <dbl> NA, NA, NA, NA, NA, NA, NA, 30.90, 2430.00, 11800.00, 298.00,…
$ `1884` <dbl> NA, NA, NA, NA, NA, NA, NA, 31.4, 2570.0, 11500.0, 303.0, NA,…
$ `1885` <dbl> NA, NA, NA, NA, NA, NA, NA, 34.20, 2910.00, 12100.00, 330.00,…
$ `1886` <dbl> NA, NA, NA, NA, NA, NA, NA, 35.10, 2890.00, 11400.00, 338.00,…
$ `1887` <dbl> NA, NA, NA, NA, NA, NA, 1090.0, 37.1, 3040.0, 12300.0, 358.0,…
$ `1888` <dbl> NA, NA, NA, NA, NA, NA, 891.0, 38.7, 3530.0, 12000.0, 373.0, …
$ `1889` <dbl> NA, NA, NA, NA, NA, NA, 1760.0, 41.8, 3430.0, 12900.0, 403.0,…
$ `1890` <dbl> NA, NA, NA, NA, NA, NA, 1370.0, 47.3, 3550.0, 13000.0, 457.0,…
$ `1891` <dbl> NA, NA, NA, NA, NA, NA, 939.0, 52.1, 4010.0, 15000.0, 503.0, …
$ `1892` <dbl> NA, NA, NA, NA, NA, NA, 1390.0, 55.1, 4150.0, 14500.0, 532.0,…
$ `1893` <dbl> NA, NA, NA, NA, NA, NA, 1550.0, 64.6, 3970.0, 17700.0, 624.0,…
$ `1894` <dbl> NA, NA, NA, NA, NA, NA, 1990.0, 65.8, 4360.0, 18100.0, 635.0,…
$ `1895` <dbl> NA, NA, NA, NA, NA, NA, 2270.0, 75.6, 4590.0, 20400.0, 730.0,…
$ `1896` <dbl> NA, NA, NA, NA, NA, NA, 2310, 77, 4510, 21300, 743, NA, NA, N…
$ `1897` <dbl> NA, NA, NA, NA, NA, NA, 2080, 89, 4980, 23000, 859, NA, NA, N…
$ `1898` <dbl> NA, NA, NA, NA, NA, NA, 2350.0, 99.9, 5620.0, 24500.0, 964.0,…
$ `1899` <dbl> NA, NA, NA, NA, NA, NA, 2920, 116, 5790, 24800, 1120, NA, NA,…
$ `1900` <dbl> NA, NA, NA, NA, NA, NA, 2070, 131, 10200, 27700, 1270, NA, NA…
$ `1901` <dbl> NA, NA, NA, NA, NA, NA, 2490, 135, 11400, 28400, 1300, NA, NA…
$ `1902` <dbl> NA, NA, NA, NA, NA, NA, 2820, 130, 11400, 25700, 1260, NA, NA…
$ `1903` <dbl> NA, NA, NA, NA, NA, NA, 2860, 127, 11200, 25600, 1230, NA, NA…
$ `1904` <dbl> NA, NA, NA, NA, NA, NA, 3800, 142, 11600, 26900, 1370, NA, NA…
$ `1905` <dbl> NA, NA, NA, NA, NA, NA, 3990, 126, 12100, 28100, 1220, NA, NA…
$ `1906` <dbl> NA, NA, NA, NA, NA, NA, 6260, 144, 14400, 33600, 1390, NA, NA…
$ `1907` <dbl> NA, NA, NA, NA, NA, NA, 6260, 161, 15500, 42200, 1560, NA, NA…
$ `1908` <dbl> NA, NA, NA, NA, NA, NA, 7620, 162, 16800, 59000, 1570, NA, NA…
$ `1909` <dbl> NA, NA, NA, NA, NA, NA, 5940, 172, 14600, 42200, 1660, NA, NA…
$ `1910` <dbl> NA, NA, NA, NA, NA, NA, 8910, 168, 17500, 57600, 1620, NA, NA…
$ `1911` <dbl> NA, NA, NA, NA, NA, NA, 9950, 174, 19300, 48100, 1680, NA, NA…
$ `1912` <dbl> NA, NA, NA, NA, NA, NA, 9490, 198, 20800, 50000, 1910, NA, NA…
$ `1913` <dbl> NA, NA, NA, NA, NA, NA, 10200, 215, 22400, 59700, 2070, NA, N…
$ `1914` <dbl> NA, NA, NA, NA, NA, NA, 8680, 194, 24500, 48900, 1870, NA, NA…
$ `1915` <dbl> NA, NA, NA, NA, NA, NA, 6950, 178, 21800, 34900, 1720, NA, NA…
$ `1916` <dbl> NA, NA, 3.67, NA, NA, NA, 4990.00, 189.00, 19300.00, 8040.00,…
$ `1917` <dbl> NA, NA, 7.33, NA, NA, NA, 2230.00, 174.00, 20800.00, 3450.00,…
$ `1918` <dbl> NA, NA, 18.3, NA, NA, NA, 2520.0, 69.5, 23000.0, 3340.0, 671.…
$ `1919` <dbl> NA, NA, 18.3, NA, NA, NA, 3730.0, 59.4, 21800.0, 3020.0, 573.…
$ `1920` <dbl> NA, NA, 22.0, NA, NA, NA, 5900.0, 54.2, 25800.0, 14500.0, 523…
$ `1921` <dbl> NA, NA, 25.7, NA, NA, NA, 5540.0, 58.7, 23200.0, 19400.0, 567…
$ `1922` <dbl> NA, NA, 25.7, NA, NA, NA, 7300.0, 71.6, 24400.0, 18600.0, 692…
$ `1923` <dbl> NA, NA, 14.7, NA, NA, NA, 8450.0, 79.1, 24900.0, 17800.0, 764…
$ `1924` <dbl> NA, NA, 29.3, NA, NA, NA, 11000.0, 94.3, 27100.0, 20100.0, 91…
$ `1925` <dbl> NA, NA, 33.0, NA, NA, NA, 11200.0, 93.1, 28300.0, 19000.0, 89…
$ `1926` <dbl> NA, NA, 40.3, NA, NA, NA, 11300.0, 135.0, 27900.0, 18600.0, 1…
$ `1927` <dbl> NA, NA, 58.7, NA, NA, NA, 13400.0, 168.0, 28900.0, 20100.0, 1…
$ `1928` <dbl> NA, NA, 73.30, NA, NA, NA, 12800.00, 186.00, 26300.00, 21200.…
$ `1929` <dbl> NA, NA, 80.70, NA, NA, NA, 13100.00, 201.00, 23700.00, 24200.…
$ `1930` <dbl> NA, NA, 84.30, NA, NA, NA, 12800.00, 273.00, 22000.00, 18900.…
$ `1931` <dbl> NA, NA, 99.00, NA, NA, NA, 12900.00, 328.00, 19600.00, 18100.…
$ `1932` <dbl> NA, NA, 114.00, NA, NA, NA, 13100.00, 369.00, 20400.00, 15200…
$ `1933` <dbl> NA, 7.33, 121.00, NA, NA, NA, 13200.00, 412.00, 21600.00, 142…
$ `1934` <dbl> NA, 7.33, 139.00, NA, NA, NA, 14300.00, 499.00, 22700.00, 138…
$ `1935` <dbl> NA, 18.3, 132.0, NA, NA, NA, 14000.0, 565.0, 25300.0, 13900.0…
$ `1936` <dbl> NA, 128.0, 51.3, NA, NA, NA, 15100.0, 648.0, 27100.0, 13600.0…
$ `1937` <dbl> NA, 297.0, 69.7, NA, NA, NA, 16700.0, 662.0, 28900.0, 15300.0…
$ `1938` <dbl> NA, 348, 33, NA, NA, NA, 16400, 699, 28100, 5790, 6750, NA, 3…
$ `1939` <dbl> NA, 433.00, 161.00, NA, NA, NA, 17400.00, 707.00, 32200.00, 6…
$ `1940` <dbl> NA, 693, 238, NA, NA, NA, 15900, 848, 29100, 7350, 8190, NA, …
$ `1941` <dbl> NA, 627, 312, NA, NA, NA, 14000, 745, 34600, 7980, 7190, NA, …
$ `1942` <dbl> NA, 744, 499, NA, NA, NA, 13500, 513, 36500, 8560, 4950, NA, …
$ `1943` <dbl> NA, 462, 469, NA, NA, NA, 14100, 655, 35000, 9620, 6320, NA, …
$ `1944` <dbl> NA, 154, 499, NA, NA, NA, 14000, 613, 34200, 9400, 5920, NA, …
$ `1945` <dbl> NA, 121, 616, NA, NA, NA, 13700, 649, 32700, 4570, 6270, NA, …
$ `1946` <dbl> NA, 484, 763, NA, NA, NA, 13700, 730, 35500, 12800, 7040, NA,…
$ `1947` <dbl> NA, 928.00, 744.00, NA, NA, NA, 14500.00, 878.00, 38000.00, 1…
$ `1948` <dbl> NA, 704.00, 803.00, NA, NA, NA, 17400.00, 935.00, 38500.00, 2…
$ `1949` <dbl> 14.70, 1020.00, 909.00, NA, NA, NA, 15400.00, 1060.00, 37700.…
$ `1950` <dbl> 84.3, 297.0, 3790.0, NA, 187.0, NA, 30000.0, 1180.0, 54800.0,…
$ `1951` <dbl> 91.7, 403.0, 4140.0, NA, 249.0, NA, 35000.0, 1280.0, 59100.0,…
$ `1952` <dbl> 91.7, 374.0, 3890.0, NA, 312.0, NA, 36100.0, 1370.0, 60300.0,…
$ `1953` <dbl> 106.0, 414.0, 4000.0, NA, 275.0, NA, 35200.0, 1450.0, 59500.0…
$ `1954` <dbl> 106.0, 502.0, 4160.0, NA, 348.0, NA, 36800.0, 1590.0, 67900.0…
$ `1955` <dbl> 154.0, 664.0, 4610.0, NA, 414.0, NA, 39600.0, 1800.0, 70700.0…
$ `1956` <dbl> 183.0, 840.0, 5000.0, NA, 502.0, NA, 44300.0, 1970.0, 73100.0…
$ `1957` <dbl> 293.0, 1510.0, 5540.0, NA, 620.0, 22.0, 47700.0, 2160.0, 7460…
$ `1958` <dbl> 330.0, 1200.0, 5220.0, NA, 594.0, 29.3, 44200.0, 2310.0, 7770…
$ `1959` <dbl> 385.0, 1440.0, 5670.0, NA, 620.0, 29.3, 49000.0, 2430.0, 8380…
$ `1960` <dbl> 414.0, 2020.0, 6160.0, NA, 550.0, 36.7, 48800.0, 2530.0, 8820…
$ `1961` <dbl> 491.0, 2280.0, 6070.0, NA, 455.0, 47.7, 51200.0, 2600.0, 9060…
$ `1962` <dbl> 689.0, 2460.0, 5670.0, NA, 1180.0, 103.0, 53700.0, 2730.0, 94…
$ `1963` <dbl> 708.0, 2080.0, 5430.0, NA, 1150.0, 84.3, 50100.0, 2930.0, 101…
$ `1964` <dbl> 840.0, 2020.0, 5650.0, NA, 1220.0, 91.7, 55700.0, 3120.0, 109…
$ `1965` <dbl> 1010.0, 2170.0, 6600.0, NA, 1190.0, 150.0, 58900.0, 3310.0, 1…
$ `1966` <dbl> 1090.0, 2550.0, 8430.0, NA, 1550.0, 348.0, 63100.0, 3490.0, 1…
$ `1967` <dbl> 1280, 2680, 8440, NA, 994, 565, 65500, 3650, 129000, 40000, 3…
$ `1968` <dbl> 1220, 3070, 9060, NA, 1670, 990, 69100, 3750, 135000, 42400, …
$ `1969` <dbl> 942, 3250, 11300, NA, 2790, 1260, 77300, 3910, 142000, 44700,…
$ `1970` <dbl> 1.67e+03, 3.74e+03, 1.51e+04, NA, 3.58e+03, 4.62e+02, 8.27e+0…
$ `1971` <dbl> 1.90e+03, 4.35e+03, 1.87e+04, NA, 3.41e+03, 4.25e+02, 8.89e+0…
$ `1972` <dbl> 1.53e+03, 5.64e+03, 2.83e+04, NA, 4.51e+03, 3.74e+02, 9.02e+0…
$ `1973` <dbl> 1.64e+03, 5.29e+03, 3.83e+04, NA, 4.88e+03, 3.30e+02, 9.41e+0…
$ `1974` <dbl> 1.92e+03, 4.35e+03, 3.19e+04, NA, 4.87e+03, 4.29e+02, 9.56e+0…
$ `1975` <dbl> 2.13e+03, 4.59e+03, 3.20e+04, NA, 4.42e+03, 7.08e+02, 9.49e+0…
$ `1976` <dbl> 1.99e+03, 4.95e+03, 3.92e+04, NA, 3.29e+03, 4.03e+02, 9.98e+0…
$ `1977` <dbl> 2.39e+03, 5.72e+03, 4.19e+04, NA, 3.53e+03, 4.66e+02, 1.01e+0…
$ `1978` <dbl> 2160, 6490, 62500, NA, 5410, 491, 103000, 5810, 202000, 57500…
$ `1979` <dbl> 2240, 7590, 45600, NA, 5500, 407, 111000, 5850, 205000, 61600…
$ `1980` <dbl> 1760, 5170, 66500, NA, 5350, 143, 109000, 6080, 221000, 52300…
$ `1981` <dbl> 1980.0, 7340.0, 46400.0, NA, 5280.0, 106.0, 102000.0, 5970.0,…
$ `1982` <dbl> 2100, 7310, 39300, NA, 4650, 293, 103000, 6080, 234000, 53900…
$ `1983` <dbl> 2520.0, 7630.0, 52600.0, NA, 5120.0, 84.3, 105000.0, 6170.0, …
$ `1984` <dbl> 2830.0, 7830.0, 71100.0, NA, 5010.0, 147.0, 107000.0, 6230.0,…
$ `1985` <dbl> 3510.0, 7880.0, 72800.0, NA, 4700.0, 249.0, 101000.0, 6710.0,…
$ `1986` <dbl> 3140, 8060, 76300, NA, 4660, 249, 104000, 6730, 240000, 54100…
$ `1987` <dbl> 3120, 7440, 84100, NA, 5820, 275, 115000, 7020, 256000, 57700…
$ `1988` <dbl> 2870, 7330, 83900, NA, 5130, 286, 121000, 7210, 261000, 53300…
$ `1989` <dbl> 2780.0, 8980.0, 80000.0, NA, 5010.0, 286.0, 117000.0, 7060.0,…
$ `1990` <dbl> 2610, 5520, 77000, 407, 5120, 282, 112000, 6620, 264000, 5770…
$ `1991` <dbl> 2440, 4290, 79000, 407, 5090, 268, 117000, 6380, 261000, 6160…
$ `1992` <dbl> 1390, 2520, 80100, 407, 5200, 264, 121000, 5830, 268000, 5670…
$ `1993` <dbl> 1350, 2340, 82200, 411, 5780, 271, 118000, 2560, 277000, 5710…
$ `1994` <dbl> 1290, 1930, 86400, 407, 3890, 268, 122000, 2710, 278000, 5710…
$ `1995` <dbl> 1240, 2090, 95300, 425, 11000, 275, 128000, 3410, 282000, 598…
$ `1996` <dbl> 1180, 2020, 97100, 455, 10500, 293, 135000, 2560, 302000, 632…
$ `1997` <dbl> 1100, 1540, 87300, 466, 7380, 308, 138000, 3230, 306000, 6270…
$ `1998` <dbl> 1040, 1750, 107000, 491, 7310, 319, 140000, 3360, 317000, 637…
$ `1999` <dbl> 821, 2980, 92000, 513, 9160, 330, 147000, 3010, 325000, 61900…
$ `2000` <dbl> 774, 3020, 87900, 524, 9540, 345, 142000, 3470, 329000, 62300…
$ `2001` <dbl> 818, 3220, 84200, 524, 9730, 348, 134000, 3540, 325000, 65900…
$ `2002` <dbl> 1070, 3750, 89900, 532, 12700, 370, 125000, 3040, 341000, 671…
$ `2003` <dbl> 1200, 4290, 91600, 535, 9060, 403, 135000, 3430, 336000, 7220…
$ `2004` <dbl> 950, 4170, 88500, 561, 18800, 422, 158000, 3640, 343000, 7240…
$ `2005` <dbl> 1330, 4250, 107000, 576, 19200, 429, 162000, 4350, 350000, 74…
$ `2006` <dbl> 1650, 3900, 101000, 546, 22300, 444, 175000, 4380, 365000, 72…
$ `2007` <dbl> 2270, 3930, 109000, 539, 25200, 469, 175000, 5060, 372000, 69…
$ `2008` <dbl> 4210, 4370, 110000, 539, 25700, 480, 189000, 5560, 386000, 69…
$ `2009` <dbl> 6770, 4380, 121000, 517, 27800, 510, 180000, 4360, 395000, 62…
$ `2010` <dbl> 8460, 4600, 119000, 517, 29100, 524, 188000, 4220, 391000, 67…
$ `2011` <dbl> 12200, 5240, 121000, 491, 30300, 513, 192000, 4920, 392000, 6…
$ `2012` <dbl> 10800, 4910, 130000, 488, 33400, 524, 192000, 5690, 388000, 6…
$ `2013` <dbl> 10000, 5060, 134000, 477, 32600, 524, 190000, 5500, 372000, 6…
$ `2014` <dbl> 9810, 5720, 145000, 462, 34800, 532, 204000, 5530, 361000, 58…
We can see that we have a large tibble. A tibble is the tidyverse version of a data frame. It is essentially a table with variable information arranged as columns, and individual observations arranged as rows. We can see that the tibble gives us information about the class of each variable. For example the country variable is made up of character (abbreviated as chr) values. We see that we have 265 different country variables and CO2 emission values for 192 different years (from 1751 to 2014). Recall that the values are emissions in metric tons also called tonnes. We can see that there are fewer NA values for later years.
Now we will modify this data to make it more usable for making visualizations. One thing we will use is the %<>% opperator which is from the magrittr package. This allows us to use our CO2_emissions data and reassign it to a modified version at the same time.
We will use the pivot_longer() function of the dplyr package to convert our data into what is called long format. This means that we will have more rows and fewer columns than our current format. This is done by collapsing multiple variables into fewer variables.
We want to collapse all of the values for the emission data across the different individual year variables into one new emission variable and we will identify what year they are from using a new Year variable.
CO2_emissions %<>%
pivot_longer(cols = -country, names_to = "Year", values_to = "Emissions")
CO2_emissions %>%
slice_sample(n = 6)# A tibble: 6 x 3
country Year Emissions
<chr> <chr> <dbl>
1 Syria 1971 8890
2 Sri Lanka 1992 5190
3 Palestine 1800 NA
4 Chad 2002 169
5 Panama 1869 NA
6 Greece 1939 253
We also want to rename the country variable to be capitalized. We can use the rename() function of the dplyr package to rename this variable. When renaming variables the new name is listed first before the =. We will also modify the Emissions data by dividing it by 1000 to make the numbers smaller. To do this we will use the mutate() function, which is also part of the dplyr() package. This function allows us to create and modify variables. You may also note that the Year variable is currently of class type character. We would like to change it to be numeric. This can also be accomplished using the mutate() function.
CO2_emissions %<>%
dplyr::rename(Country=country) %>%
# dplyr::mutate(Emissions = Emissions/1000,
dplyr::mutate(Year = as.numeric(Year),
Label = "CO2 Emissions (Metric Tons)")
#rename(`CO2 Emissions (Mg)`= Emissions)Now let’s take a look to see how our data has changed:
# A tibble: 6 x 4
Country Year Emissions Label
<chr> <dbl> <dbl> <chr>
1 Benin 1958 125 CO2 Emissions (Metric Tons)
2 Bangladesh 1795 NA CO2 Emissions (Metric Tons)
3 Mongolia 1887 NA CO2 Emissions (Metric Tons)
4 Guyana 1968 1330 CO2 Emissions (Metric Tons)
5 Gambia 2008 367 CO2 Emissions (Metric Tons)
6 Botswana 1860 NA CO2 Emissions (Metric Tons)
Great, we can see that now the Year variable is of class double (abbreviated dbl), which is a numeric class.
Now let’s take a look at the Country variable just to check if there is anything unexpected. We will use the distinct() function of the dplyr package to view the unique values only.
# A tibble: 192 x 1
Country
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 Andorra
5 Angola
6 Antigua and Barbuda
7 Argentina
8 Armenia
9 Australia
10 Austria
# … with 182 more rows
These all look as expected!
# A tibble: 6 x 220
country `1801` `1802` `1803` `1804` `1805` `1806` `1807`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghan… NA NA NA NA NA NA NA
2 Albania 0.104 0.104 0.104 0.104 0.104 0.104 0.104
3 Algeria -0.00247 -0.00247 -0.00247 -0.00247 -0.00247 -0.00247 -0.00247
4 Andorra 0.166 0.166 0.166 0.166 0.166 0.166 0.166
5 Angola 0.425 0.425 0.425 0.425 0.425 0.425 0.425
6 Antigu… NA NA NA NA NA NA NA
# … with 212 more variables: `1808` <dbl>, `1809` <dbl>, `1810` <dbl>,
# `1811` <dbl>, `1812` <dbl>, `1813` <dbl>, `1814` <dbl>, `1815` <dbl>,
# `1816` <dbl>, `1817` <dbl>, `1818` <dbl>, `1819` <dbl>, `1820` <dbl>,
# `1821` <dbl>, `1822` <dbl>, `1823` <dbl>, `1824` <dbl>, `1825` <dbl>,
# `1826` <dbl>, `1827` <dbl>, `1828` <dbl>, `1829` <dbl>, `1830` <dbl>,
# `1831` <dbl>, `1832` <dbl>, `1833` <dbl>, `1834` <dbl>, `1835` <dbl>,
# `1836` <dbl>, `1837` <dbl>, `1838` <dbl>, `1839` <dbl>, `1840` <dbl>,
# `1841` <dbl>, `1842` <dbl>, `1843` <dbl>, `1844` <dbl>, `1845` <dbl>,
# `1846` <dbl>, `1847` <dbl>, `1848` <dbl>, `1849` <dbl>, `1850` <dbl>,
# `1851` <dbl>, `1852` <dbl>, `1853` <dbl>, `1854` <dbl>, `1855` <dbl>,
# `1856` <dbl>, `1857` <dbl>, `1858` <dbl>, `1859` <dbl>, `1860` <dbl>,
# `1861` <dbl>, `1862` <dbl>, `1863` <dbl>, `1864` <dbl>, `1865` <dbl>,
# `1866` <dbl>, `1867` <dbl>, `1868` <dbl>, `1869` <dbl>, `1870` <dbl>,
# `1871` <dbl>, `1872` <dbl>, `1873` <dbl>, `1874` <dbl>, `1875` <dbl>,
# `1876` <dbl>, `1877` <dbl>, `1878` <dbl>, `1879` <dbl>, `1880` <dbl>,
# `1881` <dbl>, `1882` <dbl>, `1883` <dbl>, `1884` <dbl>, `1885` <dbl>,
# `1886` <dbl>, `1887` <dbl>, `1888` <dbl>, `1889` <dbl>, `1890` <dbl>,
# `1891` <dbl>, `1892` <dbl>, `1893` <dbl>, `1894` <dbl>, `1895` <dbl>,
# `1896` <dbl>, `1897` <dbl>, `1898` <dbl>, `1899` <dbl>, `1900` <dbl>,
# `1901` <dbl>, `1902` <dbl>, `1903` <dbl>, `1904` <dbl>, `1905` <dbl>,
# `1906` <dbl>, `1907` <dbl>, …
[1] "country" "1801" "1802" "1803" "1804" "1805" "1806"
[8] "1807" "1808" "1809" "1810" "1811" "1812" "1813"
[15] "1814" "1815" "1816" "1817" "1818" "1819" "1820"
[22] "1821" "1822" "1823" "1824" "1825" "1826" "1827"
[29] "1828" "1829" "1830" "1831" "1832" "1833" "1834"
[36] "1835" "1836" "1837" "1838" "1839" "1840" "1841"
[43] "1842" "1843" "1844" "1845" "1846" "1847" "1848"
[50] "1849" "1850" "1851" "1852" "1853" "1854" "1855"
[57] "1856" "1857" "1858" "1859" "1860" "1861" "1862"
[64] "1863" "1864" "1865" "1866" "1867" "1868" "1869"
[71] "1870" "1871" "1872" "1873" "1874" "1875" "1876"
[78] "1877" "1878" "1879" "1880" "1881" "1882" "1883"
[85] "1884" "1885" "1886" "1887" "1888" "1889" "1890"
[92] "1891" "1892" "1893" "1894" "1895" "1896" "1897"
[99] "1898" "1899" "1900" "1901" "1902" "1903" "1904"
[106] "1905" "1906" "1907" "1908" "1909" "1910" "1911"
[113] "1912" "1913" "1914" "1915" "1916" "1917" "1918"
[120] "1919" "1920" "1921" "1922" "1923" "1924" "1925"
[127] "1926" "1927" "1928" "1929" "1930" "1931" "1932"
[134] "1933" "1934" "1935" "1936" "1937" "1938" "1939"
[141] "1940" "1941" "1942" "1943" "1944" "1945" "1946"
[148] "1947" "1948" "1949" "1950" "1951" "1952" "1953"
[155] "1954" "1955" "1956" "1957" "1958" "1959" "1960"
[162] "1961" "1962" "1963" "1964" "1965" "1966" "1967"
[169] "1968" "1969" "1970" "1971" "1972" "1973" "1974"
[176] "1975" "1976" "1977" "1978" "1979" "1980" "1981"
[183] "1982" "1983" "1984" "1985" "1986" "1987" "1988"
[190] "1989" "1990" "1991" "1992" "1993" "1994" "1995"
[197] "1996" "1997" "1998" "1999" "2000" "2001" "2002"
[204] "2003" "2004" "2005" "2006" "2007" "2008" "2009"
[211] "2010" "2011" "2012" "2013" "2014" "2015" "2016"
[218] "2017" "2018" "2019"
Rows: 194
Columns: 220
$ country <chr> "Afghanistan", "Albania", "Algeria", "Andorra", "Angola", "An…
$ `1801` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1802` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1803` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1804` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1805` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1806` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1807` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1808` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1809` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1810` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1811` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1812` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1813` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1814` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1815` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1816` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1817` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1818` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1819` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1820` <dbl> NA, 0.10400, -0.00247, 0.16600, 0.42500, NA, NA, NA, 0.21600,…
$ `1821` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1822` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1823` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1824` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1825` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1826` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1827` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1828` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1829` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1830` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1831` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1832` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1833` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1834` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1835` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1836` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1837` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1838` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1839` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1840` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1841` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1842` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1843` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1844` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1845` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1846` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1847` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1848` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1849` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1850` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1851` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1852` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1853` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1854` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1855` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1856` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1857` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1858` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1859` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1860` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1861` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1862` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1863` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1864` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1865` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1866` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1867` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1868` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1869` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1870` <dbl> 0.32500, 0.21300, 1.02000, 1.17000, 0.42500, 0.66100, 1.41000…
$ `1871` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 1.410, 0.371, 0.772…
$ `1872` <dbl> 0.3250, 1.4700, 1.1400, 1.1700, 0.4250, 0.6610, 1.4100, 0.371…
$ `1873` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 1.410, 0.371, 7.600…
$ `1874` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 1.410, 0.371, 0.292…
$ `1875` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 1.410, 0.371, 7.910…
$ `1876` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, -0.952, 0.371, -3.1…
$ `1877` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 7.210, 0.371, 0.720…
$ `1878` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, -8.110, 0.371, 5.98…
$ `1879` <dbl> 0.3250, 1.4700, 1.1400, 1.1700, 0.4250, 0.6610, 1.2700, 0.371…
$ `1880` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, -5.020, 0.371, 1.91…
$ `1881` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, -1.500, 0.371, 3.95…
$ `1882` <dbl> 0.3250, 1.4700, 1.1400, 1.1700, 0.4250, 0.6610, 22.8000, 0.37…
$ `1883` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 9.000, 0.371, 10.20…
$ `1884` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 3.980, 0.371, -3.84…
$ `1885` <dbl> 0.3250, 1.4700, 1.1400, 1.1700, 0.4250, 0.6610, 14.2000, 0.37…
$ `1886` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, -2.700, -4.080, -2.…
$ `1887` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 3.690, 16.700, 6.98…
$ `1888` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 12.900, -4.020, -2.…
$ `1889` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, 6.590, -7.220, 5.40…
$ `1890` <dbl> 0.325, 1.470, 1.140, 1.170, 0.425, 0.661, -11.300, -0.635, -6…
$ `1891` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, -8.460, -8.610, 4.6…
$ `1892` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, 16.300, 9.230, -14.…
$ `1893` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, 2.840, 13.100, -7.2…
$ `1894` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, 12.200, 13.400, 1.5…
$ `1895` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, 7.760, -7.670, -7.3…
$ `1896` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, 7.520, 9.890, 5.670…
$ `1897` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, -21.900, -1.890, -7…
$ `1898` <dbl> 0.3250, 1.3700, 1.1400, 1.1700, 0.4250, 0.6610, 5.4600, 2.430…
$ `1899` <dbl> 0.325, 1.370, 1.140, 1.170, 0.425, 0.661, 14.700, 5.880, -1.3…
$ `1900` <dbl> 0.3250, 1.3700, 1.1400, 1.1700, 0.4250, 0.6610, -14.8000, -2.…
$ `1901` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, 5.620, 2.270, -4.35…
$ `1902` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, -4.850, 8.420, -0.4…
$ `1903` <dbl> 0.3250, 1.3100, 1.1400, 1.1700, 0.4250, 0.6610, 11.5000, -7.1…
$ `1904` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, 7.830, 10.300, 5.36…
$ `1905` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, 10.400, -11.700, -0…
$ `1906` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, 0.392, -4.640, 5.26…
$ `1907` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, -2.530, -4.130, 2.4…
$ `1908` <dbl> 0.3250, 1.3100, 1.1400, 1.1700, 0.4250, 0.6610, 5.1600, 8.960…
$ `1909` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, 0.294, 3.590, 6.130…
$ `1910` <dbl> 0.325, 1.310, 1.140, 1.170, 0.425, 0.661, 2.640, 6.460, 4.600…
$ `1911` <dbl> 0.32500, 1.28000, 1.14000, 1.17000, 0.42500, 0.66100, -2.2000…
$ `1912` <dbl> 0.32500, 1.28000, 1.14000, 1.17000, 0.42500, 0.66100, 4.17000…
$ `1913` <dbl> 0.32500, 1.28000, 1.14000, 1.17000, 0.42500, 0.66100, -2.9600…
$ `1914` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, -14.300, -4.720, -2…
$ `1915` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, -3.470, 2.690, -2.5…
$ `1916` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, -4.610, -3.150, -0.…
$ `1917` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, -9.830, -16.700, -1…
$ `1918` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 16.600, -16.700, -3…
$ `1919` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 1.950, -16.700, 2.4…
$ `1920` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 5.550, -5.070, 1.06…
$ `1921` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, -0.488, -5.070, 3.0…
$ `1922` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 4.950, 8.450, 3.110…
$ `1923` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 7.970, 8.450, 2.540…
$ `1924` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 4.750, 8.450, 4.340…
$ `1925` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, -3.460, 8.450, 2.50…
$ `1926` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 2.080, 8.450, 0.351…
$ `1927` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 4.350, 8.450, -0.51…
$ `1928` <dbl> 0.46300, 0.83700, 0.43200, 3.80000, 2.96000, 2.45000, 3.45000…
$ `1929` <dbl> 0.463, 0.837, 0.432, 3.800, 2.960, 2.450, 1.860, 8.450, -3.46…
$ `1930` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -6.880, 4.280, -10.…
$ `1931` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -8.810, 0.680, -7.5…
$ `1932` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -5.180, -1.760, 4.8…
$ `1933` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 2.830, 3.530, 6.110…
$ `1934` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 6.020, 8.930, 4.490…
$ `1935` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 2.480, 14.100, 5.09…
$ `1936` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -0.737, 6.560, 3.73…
$ `1937` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 5.680, 8.050, 4.170…
$ `1938` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -1.260, -0.505, 2.4…
$ `1939` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 2.260, 3.820, -0.69…
$ `1940` <dbl> 0.4630, 0.3720, 0.4320, 3.8000, 2.9600, 2.4500, 0.0522, -4.38…
$ `1941` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 3.670, -2.100, 10.1…
$ `1942` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -0.457, -2.100, 10.…
$ `1943` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -2.240, -2.100, 2.6…
$ `1944` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 9.710, -2.100, -4.4…
$ `1945` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -4.770, -2.100, -6.…
$ `1946` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 6.610, -2.100, -4.6…
$ `1947` <dbl> 0.4630, 0.3720, 0.4320, 3.8000, 2.9600, 2.4500, 8.8000, 10.90…
$ `1948` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, 3.160, 12.700, 4.55…
$ `1949` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -3.620, 8.980, 3.87…
$ `1950` <dbl> 0.463, 0.372, 0.432, 3.800, 2.960, 2.450, -1.110, 8.070, 2.42…
$ `1951` <dbl> 1.250, 4.320, -1.300, 3.800, 2.320, 2.450, 1.750, -1.470, 1.2…
$ `1952` <dbl> 1.660, 0.160, 2.150, 3.800, 2.320, 2.450, -7.090, 4.430, -1.1…
$ `1953` <dbl> 4.290, 4.040, -0.517, 3.800, 2.220, 2.450, 3.320, 2.360, 1.18…
$ `1954` <dbl> 0.3080, 2.9400, 4.9900, 3.8000, -4.1100, 2.4500, 2.2100, 2.87…
$ `1955` <dbl> 0.129, 5.390, 0.573, 3.800, 6.410, 2.450, 5.200, 6.420, 3.040…
$ `1956` <dbl> 2.530, 1.010, 7.460, 3.800, -3.350, 2.450, 0.962, 7.410, 1.01…
$ `1957` <dbl> -1.9400, 6.4100, 9.0300, 3.8000, 7.8600, 2.4500, 3.4200, 0.04…
$ `1958` <dbl> 3.520, 4.500, 1.510, 3.800, 3.620, 2.450, 4.390, 5.390, 2.660…
$ `1959` <dbl> 0.764, 4.120, 16.000, 3.800, -1.230, 2.450, -8.140, -3.080, 3…
$ `1960` <dbl> 1.430, 5.060, 4.750, 3.800, 2.260, 2.450, 6.220, 7.290, 1.880…
$ `1961` <dbl> -1.280, 0.831, -13.800, 3.800, 11.400, 2.450, 5.480, 3.870, -…
$ `1962` <dbl> -0.497, 3.310, -20.400, 3.800, -4.360, 2.450, -3.180, 1.030, …
$ `1963` <dbl> -0.429, 3.410, 23.400, 3.800, 3.380, 2.450, -3.930, -3.740, 4…
$ `1964` <dbl> -0.374, 3.420, 2.160, 3.800, 9.360, 2.450, 8.760, 11.400, 4.7…
$ `1965` <dbl> -0.124, 3.650, 3.530, 3.800, 5.710, 2.450, 7.640, 4.390, 3.08…
$ `1966` <dbl> -1.370, 3.750, -7.730, 3.800, 4.040, 2.450, -0.829, 3.670, 0.…
$ `1967` <dbl> 0.310, 3.760, 5.740, 3.800, 4.040, 2.450, 1.220, 3.310, 4.800…
$ `1968` <dbl> 1.06, 3.61, 8.40, 3.80, -3.19, 2.45, 2.87, 4.81, 3.87, 3.90, …
$ `1969` <dbl> -0.883, 3.400, 6.480, 3.800, 1.150, 2.450, 7.080, 0.450, 3.70…
$ `1970` <dbl> -0.514, 3.690, 6.830, 3.800, 4.510, 2.450, 3.850, 6.690, 4.00…
$ `1971` <dbl> -7.160, 4.000, -11.100, -0.603, 3.880, 4.800, 2.130, 1.650, 2…
$ `1972` <dbl> -4.360, 3.920, 17.500, 2.760, -1.970, 4.720, 0.370, -0.430, 0…
$ `1973` <dbl> 8.580, 4.980, 0.285, 2.630, 5.680, 6.090, 2.020, 7.380, 3.820…
$ `1974` <dbl> 2.750, 0.373, 3.010, 0.870, 0.714, 1.700, 3.700, 1.070, 0.831…
$ `1975` <dbl> 2.5300, 0.3480, 3.8900, -3.5900, -7.3700, -6.2900, -2.2400, -…
$ `1976` <dbl> 2.330, 0.400, 3.420, -0.531, -7.630, -8.910, -1.580, 2.850, 2…
$ `1977` <dbl> -9.2400, 0.4380, 5.7700, -0.6340, -1.8700, 8.2100, 4.8600, 0.…
$ `1978` <dbl> 5.210, 0.460, 9.420, -1.920, -7.870, 4.990, -4.730, 0.760, 1.…
$ `1979` <dbl> -2.180, 0.487, 5.750, -3.670, -2.600, 12.600, 5.440, -2.140, …
$ `1980` <dbl> 0.1680, 0.7060, -1.2600, -2.1000, -0.4710, 8.3500, 0.0163, -1…
$ `1981` <dbl> 10.700, 0.536, -0.656, -4.970, -7.610, 6.330, -6.960, -0.756,…
$ `1982` <dbl> 9.0500, 0.5500, 3.0800, -4.0500, -3.5200, 1.5500, -4.5000, 0.…
$ `1983` <dbl> 3.59000, 0.58400, 1.89000, -3.47000, 0.52000, 8.03000, 2.6700…
$ `1984` <dbl> -1.830, 0.569, 2.270, -2.810, 2.440, 8.850, 0.681, -0.516, 5.…
$ `1985` <dbl> -3.280, 0.523, 2.020, -1.370, 0.316, 9.540, -8.340, -0.847, 3…
$ `1986` <dbl> 7.170, 0.635, -3.790, 0.612, 0.332, 10.700, 5.880, 2.330, 0.6…
$ `1987` <dbl> -17.3000, 0.6290, -3.3100, 3.7300, 4.3100, 11.3000, 1.2500, -…
$ `1988` <dbl> -9.660, 0.633, -4.670, 3.540, 3.020, 10.000, -3.220, 0.453, 2…
$ `1989` <dbl> -2.410, 0.754, 0.771, 2.780, -2.140, 7.540, -8.340, 0.135, 2.…
$ `1990` <dbl> -5.5800, 0.8930, -3.9100, 0.8110, -3.1700, 3.2300, -3.2200, -…
$ `1991` <dbl> -0.572, -28.900, -3.490, -1.470, -2.030, 1.540, 9.290, -13.30…
$ `1992` <dbl> -7.950, -8.100, -0.752, -3.740, -8.830, -0.632, 8.540, -40.80…
$ `1993` <dbl> -13.900, 8.780, -4.440, -5.650, -26.400, 3.080, 4.660, -4.660…
$ `1994` <dbl> -10.400, 7.440, -3.070, -1.650, -1.860, 3.660, 4.770, 8.960, …
$ `1995` <dbl> 20.300, 12.600, 1.710, -0.114, 11.600, -6.630, -3.910, 8.900,…
$ `1996` <dbl> 2.660, 8.650, 1.940, 3.090, 16.600, 3.900, 4.460, 6.080, 2.48…
$ `1997` <dbl> 2.8200, -10.6000, -0.5580, 8.5900, 2.7000, 2.3400, 7.0500, 3.…
$ `1998` <dbl> 2.8300, 12.3000, 3.5000, 3.3000, -2.6300, 1.8200, 2.7900, 7.6…
$ `1999` <dbl> 2.7100, 9.5800, 1.6800, 4.0100, 0.3870, 1.6900, -4.4500, 3.35…
$ `2000` <dbl> -1.0500, 6.7900, 0.9950, 0.4010, -0.0561, -0.6830, -1.8500, 6…
$ `2001` <dbl> -10.400, 6.690, 1.130, 10.000, -0.171, 0.251, -5.470, 10.100,…
$ `2002` <dbl> 22.1000, 2.8600, 2.5500, 3.5800, 10.7000, 0.8110, -12.0000, 1…
$ `2003` <dbl> 8.040, 5.450, 5.460, 4.170, -0.247, 3.670, 7.770, 14.400, 3.0…
$ `2004` <dbl> 2.5000, 5.3600, 3.8400, 4.1800, 7.4500, 5.7500, 7.9200, 10.90…
$ `2005` <dbl> 8.6100, 4.9600, 3.8000, 4.2100, 16.6000, 3.2900, 8.1200, 14.3…
$ `2006` <dbl> 1.590, 5.270, 0.188, 2.370, 15.000, 11.300, 7.250, 13.100, 1.…
$ `2007` <dbl> 10.800, 5.410, 1.850, -1.700, 19.600, 5.780, 7.440, 13.600, 2…
$ `2008` <dbl> 0.117, 6.840, 0.472, -5.600, 10.600, 0.378, 5.570, 6.690, 0.7…
$ `2009` <dbl> 17.300, 2.910, 0.179, -6.310, -0.464, -11.700, -0.276, -15.00…
$ `2010` <dbl> 5.1700, 2.9800, 2.0600, -4.7800, 0.5940, -8.5300, 7.9400, 1.1…
$ `2011` <dbl> 3.8500, 2.4900, 0.8570, -4.3000, 1.0300, -2.9600, 7.6500, 3.6…
$ `2012` <dbl> 11.200, 2.280, 1.160, NA, 2.130, 2.790, 0.761, 6.920, 1.780, …
$ `2013` <dbl> 1.130, 1.720, 1.610, NA, 1.030, 0.468, 3.090, 2.980, 1.170, 0…
$ `2014` <dbl> 0.837, 2.610, 2.180, NA, 2.240, 1.620, -0.622, 4.050, 1.410, …
$ `2015` <dbl> 2.110, 3.820, 2.100, NA, 2.460, 1.900, -0.128, 4.290, 1.480, …
$ `2016` <dbl> 2.680, 4.720, 2.360, NA, 2.770, 2.200, 0.367, 4.490, 1.730, 1…
$ `2017` <dbl> 2.760, 5.030, 2.500, NA, 0.262, 2.200, 0.861, 4.790, 1.700, 1…
$ `2018` <dbl> 3.020, 5.030, 2.630, NA, 3.460, 2.200, 0.861, 4.790, 1.710, 1…
$ `2019` <dbl> 3.380, 5.230, 2.680, NA, 3.550, 2.200, 0.861, 4.790, 1.770, 0…
Again, we will use the pivot_longer() to transform the data to long format. We will also again change the country variable to be Country by using the rename() function , and we will make the Year varaible numeric using the mutate() function.
gdp_growth %<>%
pivot_longer(cols = -country,
names_to = "Year",
values_to = "gdp_growth") %>%
rename(Country=country) %>%
mutate(Year = as.numeric(Year),
Label = "GDP Growth/Capita (%)") %>%
rename(GDP = gdp_growth)Now let’s see how this data has changed:
# A tibble: 6 x 4
Country Year GDP Label
<chr> <dbl> <dbl> <chr>
1 Afghanistan 1801 NA GDP Growth/Capita (%)
2 Afghanistan 1802 NA GDP Growth/Capita (%)
3 Afghanistan 1803 NA GDP Growth/Capita (%)
4 Afghanistan 1804 NA GDP Growth/Capita (%)
5 Afghanistan 1805 NA GDP Growth/Capita (%)
6 Afghanistan 1806 NA GDP Growth/Capita (%)
# A tibble: 219 x 2
Year n
<dbl> <int>
1 1801 194
2 1802 194
3 1803 194
4 1804 194
5 1805 194
6 1806 194
7 1807 194
8 1808 194
9 1809 194
10 1810 194
# … with 209 more rows
Again let’s check that the Country variable only contains values we would expect.
# A tibble: 194 x 1
Country
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 Andorra
5 Angola
6 Antigua and Barbuda
7 Argentina
8 Armenia
9 Australia
10 Austria
# … with 184 more rows
Also looks good!
Now let’s take a look at the energy use per person data:
# A tibble: 6 x 57
country `1960` `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Albania NA NA NA NA NA NA NA NA NA NA
2 Algeria NA NA NA NA NA NA NA NA NA NA
3 Angola NA NA NA NA NA NA NA NA NA NA
4 Antigu… NA NA NA NA NA NA NA NA NA NA
5 Argent… NA NA NA NA NA NA NA NA NA NA
6 Armenia NA NA NA NA NA NA NA NA NA NA
# … with 46 more variables: `1970` <dbl>, `1971` <dbl>, `1972` <dbl>,
# `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>,
# `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>,
# `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>,
# `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>,
# `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>,
# `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
# `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
# `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
# `2013` <dbl>, `2014` <dbl>, `2015` <dbl>
Rows: 169
Columns: 57
$ country <chr> "Albania", "Algeria", "Angola", "Antigua and Barbuda", "Argen…
$ `1960` <dbl> NA, NA, NA, NA, NA, NA, 3060, 1550, NA, NA, NA, NA, NA, NA, 2…
$ `1961` <dbl> NA, NA, NA, NA, NA, NA, 3120, 1550, NA, NA, NA, NA, NA, NA, 2…
$ `1962` <dbl> NA, NA, NA, NA, NA, NA, 3170, 1680, NA, NA, NA, NA, NA, NA, 2…
$ `1963` <dbl> NA, NA, NA, NA, NA, NA, 3280, 1820, NA, NA, NA, NA, NA, NA, 3…
$ `1964` <dbl> NA, NA, NA, NA, NA, NA, 3350, 1860, NA, NA, NA, NA, NA, NA, 3…
$ `1965` <dbl> NA, NA, NA, NA, NA, NA, 3460, 1850, NA, NA, NA, NA, NA, NA, 3…
$ `1966` <dbl> NA, NA, NA, NA, NA, NA, 3550, 1900, NA, NA, NA, NA, NA, NA, 3…
$ `1967` <dbl> NA, NA, NA, NA, NA, NA, 3690, 1920, NA, NA, NA, NA, NA, NA, 3…
$ `1968` <dbl> NA, NA, NA, NA, NA, NA, 3760, 2050, NA, NA, NA, NA, NA, NA, 3…
$ `1969` <dbl> NA, NA, NA, NA, NA, NA, 3790, 2180, NA, NA, NA, NA, NA, NA, 3…
$ `1970` <dbl> NA, NA, NA, NA, NA, NA, 4060, 2420, NA, NA, NA, NA, NA, NA, 4…
$ `1971` <dbl> 785.0, 232.0, 556.0, NA, 1380.0, NA, 3990.0, 2510.0, NA, NA, …
$ `1972` <dbl> 866.0, 261.0, 584.0, NA, 1380.0, NA, 4040.0, 2630.0, NA, NA, …
$ `1973` <dbl> 763.0, 305.0, 568.0, NA, 1410.0, NA, 4260.0, 2830.0, NA, NA, …
$ `1974` <dbl> 777.0, 319.0, 565.0, NA, 1420.0, NA, 4290.0, 2730.0, NA, NA, …
$ `1975` <dbl> 827.0, 330.0, 536.0, NA, 1380.0, NA, 4350.0, 2650.0, NA, NA, …
$ `1976` <dbl> 891, 367, 515, NA, 1400, NA, 4410, 2870, NA, NA, 9580, 98, NA…
$ `1977` <dbl> 924.0, 399.0, 494.0, NA, 1420.0, NA, 4670.0, 2800.0, NA, NA, …
$ `1978` <dbl> 1010.0, 477.0, 527.0, NA, 1430.0, NA, 4630.0, 2890.0, NA, NA,…
$ `1979` <dbl> 864.0, 586.0, 518.0, NA, 1480.0, NA, 4680.0, 3140.0, NA, NA, …
$ `1980` <dbl> 1150, 579, 511, NA, 1490, NA, 4740, 3070, NA, NA, 7790, 103, …
$ `1981` <dbl> 989, 611, 497, NA, 1430, NA, 4690, 2900, NA, NA, 8300, 102, N…
$ `1982` <dbl> 967, 771, 473, NA, 1420, NA, 4820, 2830, NA, NA, 9070, 105, N…
$ `1983` <dbl> 1000, 808, 469, NA, 1420, NA, 4560, 2840, NA, NA, 8500, 105, …
$ `1984` <dbl> 1020, 776, 458, NA, 1450, NA, 4650, 2950, NA, NA, 8830, 104, …
$ `1985` <dbl> 917, 786, 470, NA, 1360, NA, 4600, 3050, NA, NA, 9920, 107, N…
$ `1986` <dbl> 964, 862, 462, NA, 1420, NA, 4620, 3060, NA, NA, 10300, 111, …
$ `1987` <dbl> 922, 828, 461, NA, 1480, NA, 4770, 3170, NA, NA, 9520, 107, N…
$ `1988` <dbl> 928, 850, 467, NA, 1500, NA, 4700, 3200, NA, NA, 10500, 114, …
$ `1989` <dbl> 896, 820, 465, NA, 1440, NA, 5000, 3140, NA, NA, 10200, 117, …
$ `1990` <dbl> 813, 856, 483, 1480, 1410, 2180, 5060, 3240, 3170, 2520, 1060…
$ `1991` <dbl> 573, 884, 480, NA, 1430, 2320, 4930, 3420, 3090, NA, 10100, 1…
$ `1992` <dbl> 418, 884, 467, NA, 1480, 1200, 4960, 3250, 2460, NA, 10800, 1…
$ `1993` <dbl> 412, 868, 468, NA, 1470, 652, 5150, 3260, 2180, NA, 11100, 12…
$ `1994` <dbl> 441, 819, 459, NA, 1540, 420, 5090, 3230, 1950, NA, 11600, 12…
$ `1995` <dbl> 417, 839, 445, NA, 1540, 511, 5130, 3370, 1810, NA, 11400, 13…
$ `1996` <dbl> 448, 798, 445, NA, 1580, 562, 5390, 3580, 1510, NA, 11100, 13…
$ `1997` <dbl> 385, 805, 443, NA, 1610, 594, 5470, 3550, 1440, NA, 12200, 13…
$ `1998` <dbl> 427, 821, 430, NA, 1650, 610, 5550, 3610, 1490, NA, 12400, 13…
$ `1999` <dbl> 576, 864, 439, NA, 1660, 594, 5610, 3590, 1370, NA, 11900, 13…
$ `2000` <dbl> 580, 866, 437, NA, 1660, 656, 5640, 3570, 1400, NA, 12000, 13…
$ `2001` <dbl> 597, 856, 442, NA, 1560, 657, 5450, 3760, 1410, NA, 11700, 14…
$ `2002` <dbl> 660, 904, 447, NA, 1500, 618, 5570, 3770, 1410, NA, 11500, 15…
$ `2003` <dbl> 648, 949, 466, NA, 1590, 657, 5570, 3970, 1480, NA, 11600, 15…
$ `2004` <dbl> 715, 948, 462, 1530, 1720, 698, 5600, 4010, 1540, 2060, 10900…
$ `2005` <dbl> 720, 974, 431, 1530, 1710, 843, 5560, 4090, 1600, 2110, 11700…
$ `2006` <dbl> 707, 1030, 456, 1580, 1840, 865, 5710, 4080, 1560, 2100, 1160…
$ `2007` <dbl> 680, 1070, 470, 1600, 1850, 973, 5870, 4020, 1410, 2070, 1120…
$ `2008` <dbl> 711, 1070, 491, NA, 1920, 1030, 5960, 4030, 1520, NA, 11300, …
$ `2009` <dbl> 732, 1150, 514, NA, 1850, 904, 5860, 3800, 1330, NA, 10300, 1…
$ `2010` <dbl> 729, 1110, 521, NA, 1910, 863, 5790, 4050, 1280, NA, 10200, 2…
$ `2011` <dbl> 765, 1140, 522, NA, 1930, 944, 5750, 3920, 1370, NA, 9910, 20…
$ `2012` <dbl> 688, 1220, 553, NA, 1920, 1030, 5570, 3890, 1470, NA, 9660, 2…
$ `2013` <dbl> 801, 1240, 534, NA, 1950, 1000, 5460, 3920, 1470, NA, 10400, …
$ `2014` <dbl> 808, 1320, 545, NA, 2020, 1020, 5330, 3760, 1500, NA, 10600, …
$ `2015` <dbl> NA, NA, NA, NA, NA, NA, 5480, 3800, NA, NA, NA, NA, NA, NA, 4…
To wrangle the energy_use data, we will again convert the data to long format, rename some variables, and mutate the Year data to be numeric.
energy_use %<>%
pivot_longer(cols = -country,
names_to = "Year",
values_to = "energy_use") %>%
rename(Country = country) %>%
mutate(Year = as.numeric(Year),
Label = "Energy Use (kg, oil-eq./capita)") %>%
rename(Energy = energy_use)# A tibble: 10 x 4
Country Year Energy Label
<chr> <dbl> <dbl> <chr>
1 Algeria 2011 1140 Energy Use (kg, oil-eq./capita)
2 South Sudan 1995 NA Energy Use (kg, oil-eq./capita)
3 Guatemala 1987 443 Energy Use (kg, oil-eq./capita)
4 Congo, Dem. Rep. 1988 338 Energy Use (kg, oil-eq./capita)
5 Venezuela 1984 2150 Energy Use (kg, oil-eq./capita)
6 St. Kitts and Nevis 1960 NA Energy Use (kg, oil-eq./capita)
7 Algeria 1982 771 Energy Use (kg, oil-eq./capita)
8 Cameroon 1964 NA Energy Use (kg, oil-eq./capita)
9 Cameroon 2015 NA Energy Use (kg, oil-eq./capita)
10 Iraq 1992 1280 Energy Use (kg, oil-eq./capita)
Now we will check the Country variable:
# A tibble: 169 x 1
Country
<chr>
1 Albania
2 Algeria
3 Angola
4 Antigua and Barbuda
5 Argentina
6 Armenia
7 Australia
8 Austria
9 Azerbaijan
10 Bahamas
# … with 159 more rows
Looks good!
# A tibble: 6 x 64
`Data Source` `World Developm… ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Last Updated… 43819 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Country Name Country Code Indi… Indi… 1960 1961 1962 1963 1964 1965
4 Aruba ABW Deat… SP.D… 6.38… 6.24… 6.11… 6.01… 5.91… 5.83…
5 Afghanistan AFG Deat… SP.D… 32.2… 31.6… 31.0… 30.5… 30.0… 29.5…
6 Angola AGO Deat… SP.D… 27.0… 26.8… 26.6… 26.4… 26.1… 25.9…
# … with 54 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
# ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
# ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
# ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
# ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, ...34 <chr>,
# ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
# ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>,
# ...45 <chr>, ...46 <chr>, ...47 <chr>, ...48 <chr>, ...49 <chr>,
# ...50 <chr>, ...51 <chr>, ...52 <chr>, ...53 <chr>, ...54 <chr>,
# ...55 <chr>, ...56 <chr>, ...57 <chr>, ...58 <chr>, ...59 <chr>,
# ...60 <chr>, ...61 <chr>, ...62 <chr>, ...63 <chr>, ...64 <chr>
We can see that there are a couple of empty rows which indicate when the data was updated. We can also see that the columns really start at the 3rd row. So first we will repace the column names with the 3rd row. Then we will remove the first 3 rows.
[1] "Data Source" "World Development Indicators"
[3] "...3" "...4"
[5] "...5" "...6"
[7] "...7" "...8"
[9] "...9" "...10"
[11] "...11" "...12"
[13] "...13" "...14"
[15] "...15" "...16"
[17] "...17" "...18"
[19] "...19" "...20"
[21] "...21" "...22"
[23] "...23" "...24"
[25] "...25" "...26"
[27] "...27" "...28"
[29] "...29" "...30"
[31] "...31" "...32"
[33] "...33" "...34"
[35] "...35" "...36"
[37] "...37" "...38"
[39] "...39" "...40"
[41] "...41" "...42"
[43] "...43" "...44"
[45] "...45" "...46"
[47] "...47" "...48"
[49] "...49" "...50"
[51] "...51" "...52"
[53] "...53" "...54"
[55] "...55" "...56"
[57] "...57" "...58"
[59] "...59" "...60"
[61] "...61" "...62"
[63] "...63" "...64"
[1] "Country Name" "Country Code" "Indicator Name" "Indicator Code"
[5] "1960" "1961" "1962" "1963"
[9] "1964" "1965" "1966" "1967"
[13] "1968" "1969" "1970" "1971"
[17] "1972" "1973" "1974" "1975"
[21] "1976" "1977" "1978" "1979"
[25] "1980" "1981" "1982" "1983"
[29] "1984" "1985" "1986" "1987"
[33] "1988" "1989" "1990" "1991"
[37] "1992" "1993" "1994" "1995"
[41] "1996" "1997" "1998" "1999"
[45] "2000" "2001" "2002" "2003"
[49] "2004" "2005" "2006" "2007"
[53] "2008" "2009" "2010" "2011"
[57] "2012" "2013" "2014" "2015"
[61] "2016" "2017" "2018" "2019"
Rows: 264
Columns: 64
$ `Country Name` <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorr…
$ `Country Code` <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "AR…
$ `Indicator Name` <chr> "Death rate, crude (per 1,000 people)", "Death rate,…
$ `Indicator Code` <chr> "SP.DYN.CDRT.IN", "SP.DYN.CDRT.IN", "SP.DYN.CDRT.IN"…
$ `1960` <chr> "6.3879999999999999", "32.219000000000001", "27.0970…
$ `1961` <chr> "6.2409999999999997", "31.649000000000001", "26.8590…
$ `1962` <chr> "6.1180000000000003", "31.093", "26.626999999999999"…
$ `1963` <chr> "6.0119999999999996", "30.550999999999998", "26.407"…
$ `1964` <chr> "5.9199999999999999", "30.021999999999998", "26.1939…
$ `1965` <chr> "5.8390000000000004", "29.501000000000001", "25.9660…
$ `1966` <chr> "5.7699999999999996", "28.984999999999999", "25.6900…
$ `1967` <chr> "5.7160000000000002", "28.468", "25.341999999999999"…
$ `1968` <chr> "5.6820000000000004", "27.946000000000002", "24.916"…
$ `1969` <chr> "5.6660000000000004", "27.417999999999999", "24.4179…
$ `1970` <chr> "5.6710000000000003", "26.879999999999999", "23.872"…
$ `1971` <chr> "5.6980000000000004", "26.334", "23.312000000000001"…
$ `1972` <chr> "5.7460000000000004", "25.780999999999999", "22.7770…
$ `1973` <chr> "5.8120000000000003", "25.222000000000001", "22.2959…
$ `1974` <chr> "5.8929999999999998", "24.658000000000001", "21.8850…
$ `1975` <chr> "5.9809999999999999", "24.087", "21.547999999999998"…
$ `1976` <chr> "6.0700000000000003", "23.507999999999999", "21.276"…
$ `1977` <chr> "6.157", "22.920000000000002", "21.047000000000001",…
$ `1978` <chr> "6.2359999999999998", "22.324000000000002", "20.8389…
$ `1979` <chr> "6.3079999999999998", "21.719999999999999", "20.6469…
$ `1980` <chr> "6.3760000000000003", "21.109000000000002", "20.4669…
$ `1981` <chr> "6.444", "20.489999999999998", "20.297999999999998",…
$ `1982` <chr> "6.5190000000000001", "19.864999999999998", "20.145"…
$ `1983` <chr> "6.6020000000000003", "19.239999999999998", "20.009"…
$ `1984` <chr> "6.6929999999999996", "18.617999999999999", "19.8889…
$ `1985` <chr> "6.7850000000000001", "18.004999999999999", "19.7890…
$ `1986` <chr> "6.8730000000000002", "17.405999999999999", "19.7100…
$ `1987` <chr> "6.9480000000000004", "16.826000000000001", "19.651"…
$ `1988` <chr> "7.0049999999999999", "16.268000000000001", "19.6099…
$ `1989` <chr> "7.0430000000000001", "15.738", "19.579000000000001"…
$ `1990` <chr> "7.0590000000000002", "15.241", "19.555", "5.9850000…
$ `1991` <chr> "7.0540000000000003", "14.782999999999999", "19.5330…
$ `1992` <chr> "7.0339999999999998", "14.362", "19.506", "6.1550000…
$ `1993` <chr> "7.0049999999999999", "13.974", "19.463999999999999"…
$ `1994` <chr> "6.9729999999999999", "13.616", "19.396000000000001"…
$ `1995` <chr> "6.9429999999999996", "13.282", "19.292000000000002"…
$ `1996` <chr> "6.9219999999999997", "12.964", "19.146000000000001"…
$ `1997` <chr> "6.9109999999999996", "12.654999999999999", "18.9520…
$ `1998` <chr> "6.915", "12.348000000000001", "18.706", "6.06700000…
$ `1999` <chr> "6.9340000000000002", "12.037000000000001", "18.404"…
$ `2000` <chr> "6.9710000000000001", "11.718", "18.036000000000001"…
$ `2001` <chr> "7.0220000000000002", "11.387", "17.597000000000001"…
$ `2002` <chr> "7.0839999999999996", "11.048", "17.09", "5.891", NA…
$ `2003` <chr> "7.1539999999999999", "10.704000000000001", "16.5219…
$ `2004` <chr> "7.2329999999999997", "10.356", "15.903", "6.0609999…
$ `2005` <chr> "7.3200000000000003", "10.003", "15.24", "6.20600000…
$ `2006` <chr> "7.4180000000000001", "9.6449999999999996", "14.539"…
$ `2007` <chr> "7.5270000000000001", "9.2870000000000008", "13.815"…
$ `2008` <chr> "7.6479999999999997", "8.9320000000000004", "13.0850…
$ `2009` <chr> "7.7800000000000002", "8.5839999999999996", "12.3670…
$ `2010` <chr> "7.9180000000000001", "8.25", "11.68", "6.8410000000…
$ `2011` <chr> "8.0609999999999999", "7.9359999999999999", "11.039"…
$ `2012` <chr> "8.2050000000000001", "7.6449999999999996", "10.4510…
$ `2013` <chr> "8.3469999999999995", "7.3799999999999999", "9.92099…
$ `2014` <chr> "8.4879999999999995", "7.141", "9.4540000000000006",…
$ `2015` <chr> "8.6270000000000007", "6.9290000000000003", "9.05199…
$ `2016` <chr> "8.7650000000000006", "6.742", "8.7159999999999993",…
$ `2017` <chr> "8.907", "6.5750000000000002", "8.4320000000000004",…
$ `2018` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ `2019` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
That is looking better! However, we also want to remove some variables like: Country Code, Indicator Name, and Indicator Code. We can do that using the select() functio of the dplyr package. We can use the minus sign - to indicate what variables we dont want to keep. Otherwise, we will perform similar modifications as we performed on the other datasets. Note that these variable names need quotation marks around them because they have spaces.
mortality %<>%
select(-"Country Code",
-"Indicator Name",
-"Indicator Code") %>%
rename(Country = "Country Name") %>%
pivot_longer(cols = -Country,
names_to = "Year",
values_to = "Deaths") %>%
mutate(Year = as.numeric(Year),
Deaths = as.numeric(Deaths),
Label = "Deaths/1000 People")# A tibble: 6 x 4
Country Year Deaths Label
<chr> <dbl> <dbl> <chr>
1 Aruba 1960 6.39 Deaths/1000 People
2 Aruba 1961 6.24 Deaths/1000 People
3 Aruba 1962 6.12 Deaths/1000 People
4 Aruba 1963 6.01 Deaths/1000 People
5 Aruba 1964 5.92 Deaths/1000 People
6 Aruba 1965 5.84 Deaths/1000 People
Let’s check the Country variable:
# A tibble: 264 x 1
Country
<chr>
1 Aruba
2 Afghanistan
3 Angola
4 Albania
5 Andorra
6 Arab World
7 United Arab Emirates
8 Argentina
9 Armenia
10 American Samoa
# … with 254 more rows
Ok, in this case it looks like there are some regions included as well, like:
“East Asia & Pacific (excluding high income)”, “Early-demographic dividend”, “East Asia & Pacific”, “Europe & Central Asia (excluding high income)”, “Europe & Central Asia”, “Euro area”, “Fragile and conflict affected situations”, “European Union”, “Heavily indebted poor countries (HIPC)”, “IBRD only”, “IDA & IBRD total”, “IDA total”, “IDA blend”, “IDA only”, “Not classified”, “Latin America & Caribbean (excluding high income)”, “Latin America & Caribbean”, “Least developed countries: UN classification”,“Low income”," Lower middle income“,”Low & middle income“,”Late-demographic dividend“,”Middle East & North Africa“,”Middle income“,”Middle East & North Africa (excluding high income)“,”North America“,”OECD members“,”Other small states“,”Pre-demographic dividend“,”West Bank and Gaza“,”Pacific island small states“,”Post-demographic dividend“,”French Polynesia“,”South Asia“,”Sub-Saharan Africa (excluding high income)“,”Sub-Saharan Africa“,”East Asia & Pacific (IDA & IBRD countries)“,”Europe & Central Asia (IDA & IBRD countries)“,”Latin America & the Caribbean (IDA & IBRD countries)“,”Middle East & North Africa (IDA & IBRD countries)“,”South Asia (IDA & IBRD)“,”Sub-Saharan Africa (IDA & IBRD countries)“,”Upper middle income".
Let’s remove these regions:
mortality %<>%
filter(!Country %in% c("East Asia & Pacific (excluding high income)", "Early-demographic dividend", "East Asia & Pacific", "Europe & Central Asia (excluding high income)", "Europe & Central Asia", "Euro area", "Fragile and conflict affected situations", "European Union", "Heavily indebted poor countries (HIPC)", "IBRD only", "IDA & IBRD total", "IDA total", "IDA blend", "IDA only", "Not classified", "Latin America & Caribbean (excluding high income)", "Latin America & Caribbean", "Least developed countries: UN classification", "Low income"," Lower middle income", "Low & middle income", "Late-demographic dividend", "Middle East & North Africa", "Middle income", "Middle East & North Africa (excluding high income)", "North America", "OECD members", "Other small states", "Pre-demographic dividend", "West Bank and Gaza", "Pacific island small states", "Post-demographic dividend", "French Polynesia", "South Asia",
"Sub-Saharan Africa (excluding high income)", "Sub-Saharan Africa", "East Asia & Pacific (IDA & IBRD countries)", "Europe & Central Asia (IDA & IBRD countries)", "Latin America & the Caribbean (IDA & IBRD countries)", "Middle East & North Africa (IDA & IBRD countries)", "South Asia (IDA & IBRD)", "Sub-Saharan Africa (IDA & IBRD countries)", "Upper middle income")) Now we will take a look at the US data about disasters and temperature.
# A tibble: 40 x 57
Year `Drought Count` `Drought Cost` `Drought Lower … `Drought Upper …
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1980 1 33.2 26.4 39.6
2 1981 0 0 0 0
3 1982 0 0 0 0
4 1983 1 7.8 5.5 9
5 1984 0 0 0 0
6 1985 0 0 0 0
7 1986 1 4.2 3.5 5
8 1987 0 0 0 0
9 1988 1 44.4 33.8 54
10 1989 1 6.4 5.6 7.4
# … with 30 more rows, and 52 more variables: `Drought Lower 90` <dbl>,
# `Drought Upper 90` <dbl>, `Drought Lower 95` <dbl>, `Drought Upper
# 95` <dbl>, `Flooding Count` <dbl>, `Flooding Cost` <dbl>, `Flooding Lower
# 75` <dbl>, `Flooding Upper 75` <dbl>, `Flooding Lower 90` <dbl>, `Flooding
# Upper 90` <dbl>, `Flooding Lower 95` <dbl>, `Flooding Upper 95` <dbl>,
# `Freeze Count` <dbl>, `Freeze Cost` <dbl>, `Freeze Lower 75` <dbl>, `Freeze
# Upper 75` <dbl>, `Freeze Lower 90` <dbl>, `Freeze Upper 90` <dbl>, `Freeze
# Lower 95` <dbl>, `Freeze Upper 95` <dbl>, `Severe Storm Count` <dbl>,
# `Severe Storm Cost` <dbl>, `Severe Storm Lower 75` <dbl>, `Severe Storm
# Upper 75` <dbl>, `Severe Storm Lower 90` <dbl>, `Severe Storm Upper
# 90` <dbl>, `Severe Storm Lower 95` <dbl>, `Severe Storm Upper 95` <dbl>,
# `Tropical Cyclone Count` <dbl>, `Tropical Cyclone Cost` <dbl>, `Tropical
# Cyclone Lower 75` <dbl>, `Tropical Cyclone Upper 75` <dbl>, `Tropical
# Cyclone Lower 90` <dbl>, `Tropical Cyclone Upper 90` <dbl>, `Tropical
# Cyclone Lower 95` <dbl>, `Tropical Cyclone Upper 95` <dbl>, `Wildfire
# Count` <dbl>, `Wildfire Cost` <dbl>, `Wildfire Lower 75` <dbl>, `Wildfire
# Upper 75` <dbl>, `Wildfire Lower 90` <dbl>, `Wildfire Upper 90` <dbl>,
# `Wildfire Lower 95` <dbl>, `Wildfire Upper 95` <dbl>, `Winter Storm
# Count` <dbl>, `Winter Storm Cost` <dbl>, `Winter Storm Lower 75` <dbl>,
# `Winter Storm Upper 75` <dbl>, `Winter Storm Lower 90` <dbl>, `Winter Storm
# Upper 90` <dbl>, `Winter Storm Lower 95` <dbl>, `Winter Storm Upper
# 95` <dbl>
We are specifically interested in the Year and the variables that contain the word "Count" so we will select them using the select() and contains() functions in the dplyr package. Since we are selecting for variables with the word "Count" we need to use quotation marks around it. Selecting for the variable year does not require this as that is actually the name of one of the existing variables.
# A tibble: 6 x 8
Year `Drought Count` `Flooding Count` `Freeze Count` `Severe Storm C…
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1980 1 1 0 0
2 1981 0 0 1 1
3 1982 0 0 0 2
4 1983 1 2 1 0
5 1984 0 0 0 2
6 1985 0 0 1 0
# … with 3 more variables: `Tropical Cyclone Count` <dbl>, `Wildfire
# Count` <dbl>, `Winter Storm Count` <dbl>
Now we want to create a new variable that will be the sum of all the different types of disasters for each year.
We can create this ne variable using the mutate() function of dplyr and we will use the base rowSums() function to perform the calculation. We dont want to include the Year variable in our sum, so we can exclude it using the selectfunction within the rowSums() function. However, to do so we need to indicate that we are using the data that we already used as input to our mutate() and rowSums() functions. We can do so by using a ..
Rows: 40
Columns: 9
$ Year <dbl> 1980, 1981, 1982, 1983, 1984, 1985, 1986, 19…
$ `Drought Count` <dbl> 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0,…
$ `Flooding Count` <dbl> 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,…
$ `Freeze Count` <dbl> 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,…
$ `Severe Storm Count` <dbl> 0, 1, 2, 0, 2, 0, 1, 0, 0, 1, 1, 1, 4, 1, 1,…
$ `Tropical Cyclone Count` <dbl> 1, 0, 0, 1, 0, 3, 0, 0, 0, 1, 0, 1, 2, 0, 1,…
$ `Wildfire Count` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,…
$ `Winter Storm Count` <dbl> 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 2,…
$ Disasters <dbl> 3, 2, 3, 5, 2, 5, 2, 0, 1, 5, 3, 4, 7, 5, 6,…
Great, now we are going to remove some of these variables and just keep or select using the select() function the variables we are interested in. We will keep the Flooding Count becuase as you may recall from earlier in this case study, events of extreme perciptation levels appear to be associated with global warming. We will use this as a proxy for that.
We are also going to add a new variable called Country to indicate that this data is from the United States. This will create a new variable where every value is United States.
us_disaster %<>%
dplyr::select(Year,
Disasters) %>%
mutate(Country = "United States") %>%
pivot_longer(cols = c(- Country, - Year),
names_to = "Indicator",
values_to = "Value") %>%
mutate(Label = "Number of Disasters")
us_disaster %>%
slice_head(n = 6)# A tibble: 6 x 5
Year Country Indicator Value Label
<dbl> <chr> <chr> <dbl> <chr>
1 1980 United States Disasters 3 Number of Disasters
2 1981 United States Disasters 2 Number of Disasters
3 1982 United States Disasters 3 Number of Disasters
4 1983 United States Disasters 5 Number of Disasters
5 1984 United States Disasters 2 Number of Disasters
6 1985 United States Disasters 5 Number of Disasters
# A tibble: 6 x 3
Date Value Anomaly
<dbl> <dbl> <dbl>
1 189512 50.3 -1.68
2 189612 52.0 -0.03
3 189712 51.6 -0.46
4 189812 51.4 -0.59
5 189912 51.0 -1.01
6 190012 52.8 0.75
OK, so we want to remove the Anomaly variable which is an indicator of how different the national average temperature for that year was from the average temperature from 1901-2000 which was 52.02°F.
We also want to change the date values, which are currently listed as the year followed by the number 12. To do so we want to just keep the first 4 characters in the Date variable string values. We can use the str_sub() function of the stringr package to do this. We just need to indicate the start and stop characters. In this case the start would be 1 and the 4th character would be where we want to stop, so we would use start = 1, stop = 4. Again we will create a Country variable. We will also change the name of the Date variable to Year so that it will be consistent with our other datasets. Furthermore, we also what it to be numeric. We can accomplish both renaming and changing to numeric by using the mutate() function. We canthen remove the Date variable and also order the columns just like the other us data using the select() function.
us_temperature %<>%
dplyr::select(-Anomaly) %>%
mutate(Date = str_sub(Date, start = 1, end = 4))%>%
rename() %>%
mutate(Year = as.numeric(Date),
Country = "United States",
Indicator = "Temperature",
Label = "Temperature (Fahrenheit)") %>%
select(Year, Country, Indicator, Value, Label)
us_temperature %>%
slice_head(n = 6)# A tibble: 6 x 5
Year Country Indicator Value Label
<dbl> <chr> <chr> <dbl> <chr>
1 1895 United States Temperature 50.3 Temperature (Fahrenheit)
2 1896 United States Temperature 52.0 Temperature (Fahrenheit)
3 1897 United States Temperature 51.6 Temperature (Fahrenheit)
4 1898 United States Temperature 51.4 Temperature (Fahrenheit)
5 1899 United States Temperature 51.0 Temperature (Fahrenheit)
6 1900 United States Temperature 52.8 Temperature (Fahrenheit)
Now we would like to join the different datasets together into one tibble. To do so it is often necessary to have at least one column or variable with the same name to be used as a key for putting your data together. To put all of our data together there are several *_join() functions available in the dplyr package.
We will use the full_join() function as we have different time spans for each dataset and we would like to retain as much data as possible. Thefull_join() function will simply create NA values for any of the years that are not in one of the data sets. We can check by using the base summary() function. This will also allow us to check that there are column names that are consistent in each dataset that we wish to combine.
Country Year Emissions Label
Length:50688 Min. :1751 Min. : 0 Length:50688
Class :character 1st Qu.:1817 1st Qu.: 550 Class :character
Mode :character Median :1882 Median : 4390 Mode :character
Mean :1882 Mean : 83808
3rd Qu.:1948 3rd Qu.: 31925
Max. :2014 Max. :10300000
NA's :33772
Country Year GDP Label
Length:42486 Min. :1801 Min. :-67.500 Length:42486
Class :character 1st Qu.:1855 1st Qu.: 0.133 Class :character
Mode :character Median :1910 Median : 0.633 Mode :character
Mean :1910 Mean : 1.302
3rd Qu.:1965 3rd Qu.: 2.160
Max. :2019 Max. :145.000
NA's :2392
Country Year Energy Label
Length:9464 Min. :1960 Min. : 9.58 Length:9464
Class :character 1st Qu.:1974 1st Qu.: 505.75 Class :character
Mode :character Median :1988 Median : 1185.00 Mode :character
Mean :1988 Mean : 2238.82
3rd Qu.:2001 3rd Qu.: 3030.00
Max. :2015 Max. :22000.00
NA's :3544
Country Year Deaths Label
Length:13320 Min. :1960 Min. : 1.127 Length:13320
Class :character 1st Qu.:1975 1st Qu.: 6.950 Class :character
Mode :character Median :1990 Median : 9.217 Mode :character
Mean :1990 Mean :10.596
3rd Qu.:2004 3rd Qu.:12.630
Max. :2019 Max. :54.444
NA's :1431
Indeed, Country, and Year variables are present in all of the datasets. We can see that the minimum and maximum year is different for nearly all the datasets.
We need to specify what columns/variables we will be joining by using the by = argument in the full_join() function.
data_wide <- CO2_emissions %>%
full_join(gdp_growth, by = c("Country", "Year", "Label")) %>%
full_join(energy_use, by = c("Country", "Year", "Label")) %>%
full_join(mortality, by = c("Country", "Year", "Label"))
data_wide %>%
slice_sample(n = 6)# A tibble: 6 x 7
Country Year Emissions Label GDP Energy Deaths
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Norway 1843 62.3 CO2 Emissions (Metric Tons) NA NA NA
2 Botswana 1836 NA GDP Growth/Capita (%) 0.469 NA NA
3 Algeria 2008 110000 CO2 Emissions (Metric Tons) NA NA NA
4 Togo 1951 25.7 CO2 Emissions (Metric Tons) NA NA NA
5 Denmark 1875 NA GDP Growth/Capita (%) 0.863 NA NA
6 Philippines 1938 NA GDP Growth/Capita (%) 0.876 NA NA
We can also do the same thing using by using thereduce() function of the purrr package. This is a good option if you have many dasasets to combine.
data_wide <- list(CO2_emissions,
gdp_growth,
energy_use,
mortality) %>%
reduce(full_join, by = c("Country", "Year", "Label"))
data_wide %>%
slice_head(n = 6)# A tibble: 6 x 7
Country Year Emissions Label GDP Energy Deaths
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Afghanistan 1751 NA CO2 Emissions (Metric Tons) NA NA NA
2 Afghanistan 1752 NA CO2 Emissions (Metric Tons) NA NA NA
3 Afghanistan 1753 NA CO2 Emissions (Metric Tons) NA NA NA
4 Afghanistan 1754 NA CO2 Emissions (Metric Tons) NA NA NA
5 Afghanistan 1755 NA CO2 Emissions (Metric Tons) NA NA NA
6 Afghanistan 1756 NA CO2 Emissions (Metric Tons) NA NA NA
Rows: 115,958
Columns: 7
$ Country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
$ Year <dbl> 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760,…
$ Emissions <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Label <chr> "CO2 Emissions (Metric Tons)", "CO2 Emissions (Metric Tons)…
$ GDP <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Energy <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Deaths <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Nice, looks good!
We will also make a long version of this data, where we will create an new variable called Indicator that will indicate what dataset the data came from and we will collapse the values from the columns called Emissions (CO2 Emissions (Mg)), GDP(GDP Growth/Capita (%)), Energy(Energy Use (kg, oil-eq./capita)), and Deaths (Deaths/1000 People).
data_long <- data_wide %>%
pivot_longer(cols = c(-Country, -Year, -Label),
names_to = "Indicator",
values_to = "Value")
data_long %>%
slice_sample(n = 6)# A tibble: 6 x 5
Country Year Label Indicator Value
<chr> <dbl> <chr> <chr> <dbl>
1 Slovenia 1970 CO2 Emissions (Metric Tons) GDP NA
2 Kyrgyz Republic 1968 CO2 Emissions (Metric Tons) GDP NA
3 Timor-Leste 1962 Deaths/1000 People GDP NA
4 St. Lucia 1999 Energy Use (kg, oil-eq./capita) Emissions NA
5 Poland 1816 CO2 Emissions (Metric Tons) Emissions 678
6 China 1832 GDP Growth/Capita (%) Energy NA
We will now combine this data with the US data about disasters and temperatures.
We will now use the bind_rows() function which will just append the us_temperature data and the us_disaster data after the data_long data.
# A tibble: 6 x 5
Year Country Indicator Value Label
<dbl> <chr> <chr> <dbl> <chr>
1 1980 United States Disasters 3 Number of Disasters
2 1981 United States Disasters 2 Number of Disasters
3 1982 United States Disasters 3 Number of Disasters
4 1983 United States Disasters 5 Number of Disasters
5 1984 United States Disasters 2 Number of Disasters
6 1985 United States Disasters 5 Number of Disasters
# A tibble: 6 x 5
Year Country Indicator Value Label
<dbl> <chr> <chr> <dbl> <chr>
1 1895 United States Temperature 50.3 Temperature (Fahrenheit)
2 1896 United States Temperature 52.0 Temperature (Fahrenheit)
3 1897 United States Temperature 51.6 Temperature (Fahrenheit)
4 1898 United States Temperature 51.4 Temperature (Fahrenheit)
5 1899 United States Temperature 51.0 Temperature (Fahrenheit)
6 1900 United States Temperature 52.8 Temperature (Fahrenheit)
data_long <-list(data_long,
us_disaster,
us_temperature) %>%
bind_rows()
data_long$Country <- as.factor(data_long$Country)We can check the top and bottom of the new data_long tibble to see that our us_temperature data is at the bottom. To see the end of our tibble we can use slice_tail() function of the dplyr package.
# A tibble: 6 x 5
Country Year Label Indicator Value
<fct> <dbl> <chr> <chr> <dbl>
1 Afghanistan 1751 CO2 Emissions (Metric Tons) Emissions NA
2 Afghanistan 1751 CO2 Emissions (Metric Tons) GDP NA
3 Afghanistan 1751 CO2 Emissions (Metric Tons) Energy NA
4 Afghanistan 1751 CO2 Emissions (Metric Tons) Deaths NA
5 Afghanistan 1752 CO2 Emissions (Metric Tons) Emissions NA
6 Afghanistan 1752 CO2 Emissions (Metric Tons) GDP NA
# A tibble: 6 x 5
Country Year Label Indicator Value
<fct> <dbl> <chr> <chr> <dbl>
1 United States 2014 Temperature (Fahrenheit) Temperature 52.5
2 United States 2015 Temperature (Fahrenheit) Temperature 54.4
3 United States 2016 Temperature (Fahrenheit) Temperature 54.9
4 United States 2017 Temperature (Fahrenheit) Temperature 54.6
5 United States 2018 Temperature (Fahrenheit) Temperature 53.5
6 United States 2019 Temperature (Fahrenheit) Temperature 52.7
# A tibble: 10 x 5
Country Year Label Indicator Value
<fct> <dbl> <chr> <chr> <dbl>
1 Moldova 2004 CO2 Emissions (Metric Tons) GDP NA
2 Ukraine 1970 CO2 Emissions (Metric Tons) Energy NA
3 Romania 1853 CO2 Emissions (Metric Tons) Emissions NA
4 Algeria 1848 CO2 Emissions (Metric Tons) GDP NA
5 Vietnam 1973 GDP Growth/Capita (%) Emissions NA
6 Ukraine 1981 GDP Growth/Capita (%) Energy NA
7 Brazil 1894 GDP Growth/Capita (%) GDP NA
8 Ethiopia 1951 GDP Growth/Capita (%) Energy NA
9 Panama 1918 CO2 Emissions (Metric Tons) GDP NA
10 Somalia 1956 GDP Growth/Capita (%) Energy NA
Click here for details about the difference between
full_join() and bind_rows()
The difference between this function and the full_join() function is that the bind_rows() function will essentially just append each dataset to each other, whereas the full_join() function collapses data that is comparable. Here you will see an example of what the data would have been like for data_wide if we had made it using bind_rows() and if full_join() had been used but was not joined by the Label variable. Since the Label variable has unique values for each type of Indicator, this causes the full_join() result to be the same as bind_rows(). We will specifically look at the values for China in the year of 1980.
data_wide_br <- list(CO2_emissions,
gdp_growth,
energy_use,
mortality) %>%
bind_rows()
data_wide_fj <- list(CO2_emissions,
gdp_growth,
energy_use,
mortality) %>%
reduce(full_join, by = c("Country", "Year"))
data_wide_fj_label <- list(CO2_emissions,
gdp_growth,
energy_use,
mortality) %>%
reduce(full_join, by = c("Country", "Year", "Label"))
dim(data_wide_fj)[1] 54726 10
[1] 115958 7
[1] TRUE
# A tibble: 1 x 10
Country Year Emissions Label.x GDP Label.y Energy Label.x.x Deaths
<chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 China 1980 1470000 CO2 Em… 2.16 GDP Gr… 609 Energy U… 6.34
# … with 1 more variable: Label.y.y <chr>
# A tibble: 4 x 7
Country Year Emissions Label GDP Energy Deaths
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 China 1980 1470000 CO2 Emissions (Metric Tons) NA NA NA
2 China 1980 NA GDP Growth/Capita (%) 2.16 NA NA
3 China 1980 NA Energy Use (kg, oil-eq./capita) NA 609 NA
4 China 1980 NA Deaths/1000 People NA NA 6.34
# A tibble: 4 x 7
Country Year Emissions Label GDP Energy Deaths
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 China 1980 1470000 CO2 Emissions (Metric Tons) NA NA NA
2 China 1980 NA GDP Growth/Capita (%) 2.16 NA NA
3 China 1980 NA Energy Use (kg, oil-eq./capita) NA 609 NA
4 China 1980 NA Deaths/1000 People NA NA 6.34
We will also create a new variable called Region that will indicate if the data is about the United States or a different country based on the values in the Country variable. We will use the case_when() function of the dplyr package to do this. If the Country variable is equal to "United States" the value for the new variable will also be “United States”, where as if the Country variable is not equal to "United States" but is some other character string value, such as "Afghanistan", then the value for the new variable will be "Rest of the World". The new values for the new variable Region are indicated after the specific conditional statements by using the ~ symbol.
data_long %<>%
mutate(Region = case_when(Country == "United States" ~ "United States",
Country != "United States" ~ "Rest of the World"))
data_long %>%
slice_head(n = 6)# A tibble: 6 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Afghanistan 1751 CO2 Emissions (Metric Tons) Emissions NA Rest of the Wor…
2 Afghanistan 1751 CO2 Emissions (Metric Tons) GDP NA Rest of the Wor…
3 Afghanistan 1751 CO2 Emissions (Metric Tons) Energy NA Rest of the Wor…
4 Afghanistan 1751 CO2 Emissions (Metric Tons) Deaths NA Rest of the Wor…
5 Afghanistan 1752 CO2 Emissions (Metric Tons) Emissions NA Rest of the Wor…
6 Afghanistan 1752 CO2 Emissions (Metric Tons) GDP NA Rest of the Wor…
To remove entries for countries with NA values we can use the drop_na() function of the tidyr package to drop all years with missing data.
You can see that by removing the NA values the data for Afghanistan starts at 1949 instead of 1751.
# A tibble: 6 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Afghanistan 1949 CO2 Emissions (Metric Tons) Emissions 14.7 Rest of the Wor…
2 Afghanistan 1950 CO2 Emissions (Metric Tons) Emissions 84.3 Rest of the Wor…
3 Afghanistan 1951 CO2 Emissions (Metric Tons) Emissions 91.7 Rest of the Wor…
4 Afghanistan 1952 CO2 Emissions (Metric Tons) Emissions 91.7 Rest of the Wor…
5 Afghanistan 1953 CO2 Emissions (Metric Tons) Emissions 106 Rest of the Wor…
6 Afghanistan 1954 CO2 Emissions (Metric Tons) Emissions 106 Rest of the Wor…
Now we will create some simple plots to examine the data.
We can check the time span of this data by refering back to the What are the data? section. To make these plots we will use the ggplot2 package. The first step in creating a plot is to define what data we intend to use and what data will be ploted on the x-axis, the y-axis, and if any data will be used to determine the color or the fill (also color of plots that have something to fill like a bar plot) or group. All of these are defined using the aes() argument, which is short for aesthetic mappings.
First we will take a look at the CO2 emission data.
We first need to give the correct data input. We will filter our data to only include the CO2 emissions data by using the filter() function of the dplyr package. To use this function we need to specify what value we want for a given variable. In this case we want all rows where the Indicator variable is equal to the word Emissions. Notice that this needs to be in quotes, while the variable name does not.
Then we use the aes() argument of the ggplot() function to define that our x-axis will be the Year variable, the y-axis will be the emission Value variable, and that our data should be grouped or separted by the Country variable. If we were to stop there we would get a blank plot, as you can see below. We need to add another layer to define how we want the plot to look. We do so by using the + sign in between each command.
data_long %>%
filter(Indicator == "Emissions") %>%
ggplot(aes(x = Year, y = Value, group = Country)) We will use the
geom_line() function becuase we would like to create a line plot. There are many geom_* functions to choose from that create many different types of plots.
Type geom into the RStudio console and you will see many options to scroll through.
Since we have many overlapping lines, we will make our lines slightly transparent by using the
alpha argument. This takes values from 0 to 1, where 0 is completely transparent and 1 is completely opague. We will also add labels using the labs() function. Again, notice that a plus sign is used between each layer that we add to the plot. To make CO2 appear with a subscript we can use ~CO[2]~. We will also use the function theme_linedraw() of ggplot2 to change the general apearance of the plot.
Type theme_ in the RStudio console to see the varios plot theme options available.
We will also use the theme() function to change the font size of the x-axis, y-axis, axis titles, and the caption as shown below. To know what to call each element of the plot in this function to change the size type ?theme() in the console. You will see a very large list that includes other plot aspects like the background and the legend. This function can be used to modify your plot to your specifications. We will also use it to remove the legend title by using element_blank(). In this case, we are also saving the plot to an object called co2plot. To show the plot we simply type the name of the object.
co2plot <-data_long %>%
filter(Indicator == "Emissions") %>%
ggplot(aes(x = Year, y = Value, group = Country))+
geom_line(alpha = 0.4) +
labs(title = "Country" ~CO[2]~ "Emissions per Year, 1751-2014",
caption = "Limited to reporting countries",
y = "Emissions (Metric Tons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank())
co2plotGreat! We’ve created our first plot. We can see that many countries show a dramatic increase in emissions over time with a handful of countries with particularly high levels. What about the United States? Which line indicates the emissions in the US? We can add another layer on top of our first plot to add a red line just for the US data. To do this we need to indicate what data we would like to plot, so we need to filter for just the US data and then we need to indicate that it will be colored by Country, even though in this case we only have one line to color. The default color would be a salmon pink color, but we would like red. So we will use the scale_color_manual() function to manually choose the color that we want by using scale_colour_manual(values = c("red")). Notice how the color name needs to be in quotes and that the argument values = is used to specify what color values to use.
We can add this line to the plot in two ways. The first way is to add the code for this layer to the original code that we used to create the co2plot or the second way is to simply add to that plot object by using the +.
data_long %>%
filter(Indicator == "Emissions") %>%
ggplot(aes(x = Year, y = Value, group = Country))+
geom_line(alpha = 0.4) +
labs(title = "Country" ~CO[2]~ "Emissions per Year, 1751-2014",
caption = "Limited to reporting countries",
y = "Emissions (Metric Tons)") +
geom_line(data = data_long %>%
filter(Indicator == "Emissions",
Country == "United States"), aes(x=Year, y=Value, color = Country)) +
scale_colour_manual(values=c("red")) +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank())co2plot + geom_line(data = data_long %>%
filter(Indicator == "Emissions",
Country == "United States"), aes(x=Year, y=Value, color = Country)) +
scale_colour_manual(values=c("red"))It looks like the US has long been the largest CO2 emission producing country until recently, when the US was surpassed by another country.
Let’s figure out which country, by seeing what the top 10 emission producing countries were in 2014. We can do so by filtering the data for 2014, which was the final year of the data. Then we can make a rank variable based on the Value variable for the amount of emissions produced. There are many functions in the dplyr package for ranking values that are based on the SQL rank functions. SQL is another programming language for managing large amounts of data. The difference in the rank functions mostly has to do with how to deal with ties in the data. We will use dense_rank(), as we do not want gaps between ranks.
We want to do this in descending order becuase we want to rank by largest to smallest, so we will use the desc() function of the dplyr package. Then we will arrange the output by rank using the arrange() function of the dplyr package.
top_10_count <-data_long %>%
filter(Indicator=="Emissions") %>%
filter(Year==2014) %>%
mutate(rank=dense_rank(desc(Value))) %>%
filter(rank<=10) %>%
arrange(rank)
top_10_count# A tibble: 10 x 7
Country Year Label Indicator Value Region rank
<fct> <dbl> <chr> <chr> <dbl> <chr> <int>
1 China 2014 CO2 Emissions (Metr… Emissions 1.03e7 Rest of the … 1
2 United Stat… 2014 CO2 Emissions (Metr… Emissions 5.25e6 United States 2
3 India 2014 CO2 Emissions (Metr… Emissions 2.24e6 Rest of the … 3
4 Russia 2014 CO2 Emissions (Metr… Emissions 1.71e6 Rest of the … 4
5 Japan 2014 CO2 Emissions (Metr… Emissions 1.21e6 Rest of the … 5
6 Germany 2014 CO2 Emissions (Metr… Emissions 7.20e5 Rest of the … 6
7 Iran 2014 CO2 Emissions (Metr… Emissions 6.49e5 Rest of the … 7
8 Saudi Arabia 2014 CO2 Emissions (Metr… Emissions 6.01e5 Rest of the … 8
9 South Korea 2014 CO2 Emissions (Metr… Emissions 5.87e5 Rest of the … 9
10 Canada 2014 CO2 Emissions (Metr… Emissions 5.37e5 Rest of the … 10
We can see that China is now the top emission producing country.
Let’s make a plot of these top countries. We need to filter the data to just these top countries by using the %in% opperator to only keep countries in ourCountry variable that are also in the Country variable within top_10_count. We can use the pull() function also fo the dplyr package to specifically grab just the Country data out of top_10_count.
Since we have 10 countries we will want to differentiate them by color.
To color our plot we will use the viridis color pallette which is compatible with color-blindness by using the scale_fill_viridis_d() function which is simply available by loading the ggplot2 package. There are a few variations such as discreet as _d, or binned continuous as _b, or continuous scale as _c. See here for more information.
Top10b <- data_long %>%
filter(Country %in% pull(top_10_count, Country)) %>%
filter(Indicator == "Emissions") %>%
filter(Year >= 1900) %>%
ggplot(aes(x = Year, y = Value, color = Country)) +
geom_line() +
scale_color_viridis_d()+
theme_linedraw() +
labs(title = "Top 10 Emissions-producing Countries in 2010 (1900-2014)",
subtitle = "Ordered by Emissions Produced in 2014",
y = "Emissions (Metric Tons)",
x = "Year") +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16))
Top10bIt’s still a bit difficult to tell which line corresponds to which country. So, let’s add a label. One way to do this is to add text layer to our plot using the geom_text() function of the ggplot2 package. We need to first specify what data we will use, in this case we will filter for just the data for the last year (which we can do using the last() function of the dplyr package) and then we need to indicate that our label will be based on the Country variable using the aes() asthetics mapping argument. We will also get rid of our legend since we will not need it anymore, by using the theme() function of the ggplot2 package.
Top10b +
geom_text(data = data_long %>%
filter(Country %in% pull(top_10_count, Country)) %>%
filter(Indicator == "Emissions") %>%
filter(Year == last(Year)), aes(label = Country)) +
theme(legend.position = "none")Not bad, but some of the labels are overlapping and difficult to read. We can use the check_overlap = TRUE argument within the geom_text() function to remove overlapping variables and we can expand the plot area horizontally so that the names are not cutoff by using scale_x_continuous(expand = c(0.2,0)).
Top10b +
geom_text(data = data_long %>%
filter(Country %in% pull(top_10_count, Country)) %>%
filter(Indicator=="Emissions") %>%
filter(Year == last(Year)), aes(label = Country), check_overlap = TRUE) +
scale_x_continuous(expand = c(0.2,0))+
theme(legend.position = "none")This is easier to read now, but it also causes us to lose some of the labels. There are several alternative ways we can keep all of our labels and make them easier to read. The first package we will show is called directlabels.
The most simple option is to use the direct.label() function. which will automatically add lables at the end of the lines. However, it is a bit difficult to see some of our labels as they get automatically sized to fit the plot.
Alternatively this can be done in a more ggplot2 layering method by using the geom_dl() function.
Top10b +scale_x_continuous(expand = c(0.3,0)) +
geom_dl(aes(label = Country), method = list("last.bumpup")) +
theme(legend.position = "none")This is nice and legible now. We have all 10 countries names listed and they are in order of the last data point and they are relatively close to the lines that they correspond to.
Another option is to use a different method in the directlables package. Here is a list of options.
The "angled.boxes" method looks nice for some plots but doesn’t work very well for our plot:
However the "last.polygons" method works quite well:
The second package is the
ggrepel package which is especially good for crowded labels that might overlap one another. It alows for more control than the directlabels package. We will use the geom_text_repel() function. Just like with geom_text, first we need to specify what data we want to include. We then specify with the aes() argument that our label will be based on the Country variable and we again specify what variable to use for our x axis and y axis, so that we indicate where the labels should be plotted.
Top10b + geom_text_repel(data = data_long %>%
filter(Country %in% pull(top_10_count, Country)) %>%
filter(Indicator=="Emissions") %>%
filter(Year == last(Year)),
aes(label = Country,
x = Year,
y = Value)) +
theme(legend.position = "none")+scale_x_continuous(expand = c(0.3,0)) You can see that this package creates segments that connect the label to the line.
There are many arguments to use to style your labels just the way that you want:
See here for more details.
Top10b + geom_text_repel(data = data_long %>%
filter(Country %in% pull(top_10_count, Country)) %>%
filter(Indicator=="Emissions") %>%
filter(Year == last(Year)),
aes(label = Country,
x = Year,
y = Value),
nudge_x = 10,
hjust = 1,
vjust = 1,
segment.size = 0.25,
force = 1)+
theme(legend.position = "none")+
scale_x_continuous(expand = c(0.3,0))+
scale_y_continuous(expand = c(0.3,0)) Nice, that looks pretty good.
Now let’s try showing our data in a different way. This time we will create a geom_tile plot. To color our plot we will use the viridis color pallette again but this time we will use the scale_fill_viridis_c(), recall that the _c indicates a continuous scale. See here for more information. Again, we will filter our data to include only the Countries included in the Country variable of the top_10_count. Recall that the pull() function specifically grabs the Country variable data values within top_10_count. Then we will use the fct_reorder() function of the forcats package to order our countries based on the last emission value in 2014.
To use this function, the variable that is to be reordered is listed first, then the variable that is being used to determine the order, followed by a function to determine the order, in this case the last value using the last() function (recall that this is also a function of the dplyr package).
Top10<-data_long %>%
filter(Country %in% pull(top_10_count, Country)) %>%
filter(Indicator=="Emissions") %>%
filter(Year>=1900)%>%
ggplot(aes(x=Year, y=fct_reorder(Country, Value, last))) +
geom_tile( aes(fill=log(Value))) +
scale_fill_viridis_c()+
scale_x_continuous(breaks = seq(1900,2014,by=5),
labels = seq(1900,2014,by=5)) +
labs(title = "Top 10 "~CO[2]~"Emission-producing Countries in 2014",
subtitle = "Ordered by Emissions Produced in 2014",
fill = "Ln(CO2 Emissions (Metric Tons))") +
theme_classic() +
theme(axis.text.x = element_text(size = 12, angle = 90),
axis.text.y = element_text(size = 12),
axis.title = element_blank(),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.position = "bottom")
Top10We can also create this plot directly without using the top_10_count tibble, by creating a new variable for the last value that we will call last_val, or in other words the emission value in 2014 for each country. To do this we need to first use the group_by() function of the dplyr package to make sure that the last value is calculated and repeated for each row for a given country. Here you can see that that is the case for Afghanistan.
data_long %>%
filter(Indicator=="Emissions") %>%
filter(Year>=1900) %>%
group_by(Country) %>%
mutate(last_val = last(Value))# A tibble: 14,542 x 7
# Groups: Country [192]
Country Year Label Indicator Value Region last_val
<fct> <dbl> <chr> <chr> <dbl> <chr> <dbl>
1 Afghanist… 1949 CO2 Emissions (Metri… Emissions 14.7 Rest of the … 9810
2 Afghanist… 1950 CO2 Emissions (Metri… Emissions 84.3 Rest of the … 9810
3 Afghanist… 1951 CO2 Emissions (Metri… Emissions 91.7 Rest of the … 9810
4 Afghanist… 1952 CO2 Emissions (Metri… Emissions 91.7 Rest of the … 9810
5 Afghanist… 1953 CO2 Emissions (Metri… Emissions 106 Rest of the … 9810
6 Afghanist… 1954 CO2 Emissions (Metri… Emissions 106 Rest of the … 9810
7 Afghanist… 1955 CO2 Emissions (Metri… Emissions 154 Rest of the … 9810
8 Afghanist… 1956 CO2 Emissions (Metri… Emissions 183 Rest of the … 9810
9 Afghanist… 1957 CO2 Emissions (Metri… Emissions 293 Rest of the … 9810
10 Afghanist… 1958 CO2 Emissions (Metri… Emissions 330 Rest of the … 9810
# … with 14,532 more rows
Now we will also create a rank variable like we did when we created top_10_count that will be calculated as the rank of the countries based on the last_val value (again this is the emission value in the last year of the data, 2014). Now we want to ungroup our data, as we want the rank to be calculated across the countries.
data_long %>%
filter(Indicator=="Emissions") %>%
filter(Year>=1900) %>%
group_by(Country) %>%
mutate(last_val = last(Value)) %>%
ungroup() %>%
mutate(rank=dense_rank(desc(last_val))) %>%
filter(rank<=10) # A tibble: 1,054 x 8
Country Year Label Indicator Value Region last_val rank
<fct> <dbl> <chr> <chr> <dbl> <chr> <dbl> <int>
1 Canada 1900 CO2 Emissions (Met… Emissions 20600 Rest of the… 537000 10
2 Canada 1901 CO2 Emissions (Met… Emissions 23900 Rest of the… 537000 10
3 Canada 1902 CO2 Emissions (Met… Emissions 25700 Rest of the… 537000 10
4 Canada 1903 CO2 Emissions (Met… Emissions 28000 Rest of the… 537000 10
5 Canada 1904 CO2 Emissions (Met… Emissions 33100 Rest of the… 537000 10
6 Canada 1905 CO2 Emissions (Met… Emissions 35400 Rest of the… 537000 10
7 Canada 1906 CO2 Emissions (Met… Emissions 37400 Rest of the… 537000 10
8 Canada 1907 CO2 Emissions (Met… Emissions 47000 Rest of the… 537000 10
9 Canada 1908 CO2 Emissions (Met… Emissions 47400 Rest of the… 537000 10
10 Canada 1909 CO2 Emissions (Met… Emissions 45400 Rest of the… 537000 10
# … with 1,044 more rows
Now we can put it all together to create the plot directly from data_long.
Top10<-data_long %>%
filter(Indicator == "Emissions") %>%
filter(Year >= 1900) %>%
group_by(Country) %>%
mutate(last_val = last(Value)) %>%
ungroup() %>%
mutate(rank=dense_rank(desc(last_val))) %>%
filter(rank<=10) %>%
ggplot(aes(x=Year, y=fct_reorder(Country, Value, last))) +
geom_tile( aes(fill=log(Value))) +
scale_fill_viridis_c() +
scale_x_continuous(breaks = seq(1900,2014,by=5),
labels = seq(1900,2014,by=5)) +
labs(title = "Top 10 "~CO[2]~"Emission-producing Countries in 2014",
subtitle = "Ordered by Emissions Produced in 2014",
fill = "Ln(CO2 Emissions (Mg))") +
theme_classic() +
theme(axis.text.x = element_text(size = 12, angle = 90),
axis.text.y = element_text(size = 12),
axis.title = element_blank(),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.position = "bottom")
Top10 We can see that Germany had very low emission rates at the end of World War II. We see that the US has consistently had high emission rates since 1900, but that the emission rates in China recently surpased that of the US. The portions of the plot that are white indicate that there is no emission data for that country.
Now let’s take a look at the data in slightly different way. Let’s look at overall global emissions by calculating a sum each year of all the emission values for the different countries. Note that this is limited to only the countries included in the dataset.
To calculate this value we will first use the group_by() function of the dplyr package. This will allow our calcluation to be performed on aggregated data by the different values for the Year variable. Otherwise, we would simply get a sum of overall emissions across all of the years in the data set.
Then we will use the summarize() function (also of the dplyr package) and the base sum() function to calculate a sum of the emission values each year.
Since we will be ploting only one value each year, we do not need to assign a group in the aes() argument. this time we will make the size of the line that will be plotted a bit larger using the size() argument in the geom_line() function.
CO2_world<-data_long %>%
filter(Indicator == "Emissions") %>%
group_by(Year) %>%
summarize(Value = sum(Value)) %>%
ggplot(aes(x = Year, y = Value)) +
geom_line(size = 1.5) +
labs(title = "World "~CO[2]~" Emissions per Year , 1751-2014",
caption = "Limited to reporting countries",
y = "Emissions (Metric Tonnes)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16))
CO2_worldOk, we can now clearly see that global CO2 emissions have dramatically risen since 1900.
We can make an annimation of this plot using the gganimate package.
animation_1 <- data_long %>%
filter(Indicator == "Deaths") %>%
ggplot(aes(x = Year,
y = Value,
group = Country,
color = Region,
size = Region,
alpha = Region)) +
geom_point() +
scale_color_manual(values = c("Red","Black")) +
scale_alpha_manual(values = c(0.1, 1)) +
scale_size_manual(values = c(0.25, 2)) +
labs(title = "Distribution of Indicators by Year and Value, 1980-2010",
y = "Crude Mortality Rate") +
theme(axis.text.x = element_text(angle = 90)) +
theme_classic() +
transition_time(as.integer(Year)) +
gganimate::shadow_wake(wake_length = 1, alpha = FALSE)
gganimate::animate(animation_1, fps = 10, duration = 5)Now we will take a look a GDP growth of various countries
data_long %>%
filter(Indicator == "GDP") %>%
ggplot(aes(x = Year, y = Value, group = Country)) +
geom_line(alpha = 0.2) +
labs(title = "Country GDP Growth per Capita per Year (Annual %), 1801-2019",
caption = "Limited to reporting countries",
y = "GDP Growth per Capita (Annual %)") +
geom_line(data = data_long %>%
filter(Indicator == "GDP",
Country == "United States"),
aes(x=Year, y=Value, color = Country)) +
scale_colour_manual(values=c("red")) +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank()) We can see that the variation in GDP has become greater over time.
data_long %>%
filter(Indicator == "GDP",
Year >= 1801) %>%
group_by(Year) %>%
summarise(Value = mean(Value, na.rm = TRUE)) %>%
ggplot(aes(x=Year, y=Value)) +
geom_line() +
labs(title = "Mean Country GDP Growth per Capita per Year (Annual %), 1801-2019",
caption = "Limited to reporting countries",
y = "GDP Growth per Capita (Annual %)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16))data_long %>%
filter(Indicator == "Energy") %>%
ggplot(aes(x=Year, y= Value, group=Country)) +
geom_line(alpha = 0.2) +
geom_line(data = data_long %>%
filter(Indicator == "Energy",
Country == "United States"), aes(x=Year, y=Value, color = Country)) +
scale_colour_manual(values=c("red")) +
labs(title = "Country Energy Use (kg of Oil Equivalent per Capita), 1960 to 2015",
caption = "Limited to reporting countries",
y = "Energy Use (kg of Oil Equivalent per Capita)")+
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 15),
legend.title = element_blank())Let’s see who the top countries are. First let’s take a look at the year 2000, and then 2014.
# A tibble: 10 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Qatar 2000 Energy Use (kg, oil-eq… Energy 18400 Rest of the …
2 Bahrain 2000 Energy Use (kg, oil-eq… Energy 12000 Rest of the …
3 Iceland 2000 Energy Use (kg, oil-eq… Energy 11100 Rest of the …
4 United Arab Emir… 2000 Energy Use (kg, oil-eq… Energy 9990 Rest of the …
5 Kuwait 2000 Energy Use (kg, oil-eq… Energy 9130 Rest of the …
6 Canada 2000 Energy Use (kg, oil-eq… Energy 8240 Rest of the …
7 United States 2000 Energy Use (kg, oil-eq… Energy 8060 United States
8 Trinidad and Tob… 2000 Energy Use (kg, oil-eq… Energy 7760 Rest of the …
9 Luxembourg 2000 Energy Use (kg, oil-eq… Energy 7680 Rest of the …
10 Brunei 2000 Energy Use (kg, oil-eq… Energy 7160 Rest of the …
# A tibble: 10 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Qatar 2014 Energy Use (kg, oil-eq… Energy 18600 Rest of the …
2 Iceland 2014 Energy Use (kg, oil-eq… Energy 17900 Rest of the …
3 Trinidad and Tob… 2014 Energy Use (kg, oil-eq… Energy 14400 Rest of the …
4 Bahrain 2014 Energy Use (kg, oil-eq… Energy 10600 Rest of the …
5 Kuwait 2014 Energy Use (kg, oil-eq… Energy 8960 Rest of the …
6 Brunei 2014 Energy Use (kg, oil-eq… Energy 8630 Rest of the …
7 Canada 2014 Energy Use (kg, oil-eq… Energy 7880 Rest of the …
8 United Arab Emir… 2014 Energy Use (kg, oil-eq… Energy 7770 Rest of the …
9 United States 2014 Energy Use (kg, oil-eq… Energy 6960 United States
10 Saudi Arabia 2014 Energy Use (kg, oil-eq… Energy 6940 Rest of the …
data_long %>%
filter(Indicator == "Energy") %>%
group_by(Year) %>%
summarise(Value = sum(Value, na.rm = TRUE)) %>%
ggplot(aes(x = Year, y = Value)) +
geom_line() +
labs(title = "Worldwide Energy Use (kg of Oil Equivalent per Capita), 1960 to 2015",
caption = "Limited to reporting countries",
y = "Energy Use (kg of Oil Equivalent per Capita)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 15))Mortality <-data_long %>%
filter(Indicator == "Deaths") %>%
ggplot(aes(x = Year, y = Value, group = Country)) +
geom_line(alpha = 0.2) +
geom_line(data = data_long %>%
filter(Indicator == "Deaths",
Country == "United States",
Year >= 1960,
Year <= 2019),
aes(x = Year, y = Value, color = Country)) +
scale_colour_manual(values=c("red")) +
labs(title = "Country Crude Mortality Rate (per 1000 Persons), 1960 to 2019",
caption = "Limited to reporting countries",
y = "Crude Mortality Rate (per 1000 Persons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank())
MortalityLet’s take a look to see which countries account for the large peaks in Morality in the late 1970s and the early 1990s. It’s always a good idea to check your data if you see anomolies like this.
# A tibble: 3 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Cambodia 1978 Deaths/1000 People Deaths 54.0 Rest of the World
2 Timor-Leste 1978 Deaths/1000 People Deaths 28.3 Rest of the World
3 Niger 1978 Deaths/1000 People Deaths 26.7 Rest of the World
Looks like Cambodia is the country with the large peak in the late 1970s. If you look up the history of Cambodia during this time period, you will see that the peak we are seeing makes sense because Cambodia experienced war and genicide during this time.
# A tibble: 3 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Rwanda 1993 Deaths/1000 People Deaths 41.4 Rest of the World
2 Sierra Leone 1993 Deaths/1000 People Deaths 26.1 Rest of the World
3 Niger 1993 Deaths/1000 People Deaths 21.4 Rest of the World
Rwanda is the country with the peak in the early 1990s. This also makes sense because Rwanda experienced a civil war at this time.
What about now? What countries have the highest rates?
# A tibble: 10 x 6
Country Year Label Indicator Value Region
<fct> <dbl> <chr> <chr> <dbl> <chr>
1 Bulgaria 2017 Deaths/1000 People Deaths 15.5 Rest of the World
2 Latvia 2017 Deaths/1000 People Deaths 14.8 Rest of the World
3 Serbia 2017 Deaths/1000 People Deaths 14.8 Rest of the World
4 Lesotho 2017 Deaths/1000 People Deaths 14.7 Rest of the World
5 Ukraine 2017 Deaths/1000 People Deaths 14.5 Rest of the World
6 Lithuania 2017 Deaths/1000 People Deaths 14.2 Rest of the World
7 Hungary 2017 Deaths/1000 People Deaths 13.5 Rest of the World
8 Romania 2017 Deaths/1000 People Deaths 13.3 Rest of the World
9 Croatia 2017 Deaths/1000 People Deaths 13 Rest of the World
10 Georgia 2017 Deaths/1000 People Deaths 12.9 Rest of the World
It seems that many eastern european countries currently have the highest mortality rates, as well as Lesotho.
Let’s make a plot of just these countries:
Mortality <-data_long %>%
filter(Indicator == "Deaths") %>%
group_by(Country) %>%
mutate(last_val = last(Value)) %>%
ungroup() %>%
mutate(rank = dense_rank(desc(last_val))) %>%
filter(rank <= 5) %>%
ggplot(aes(x = Year, y = Value, color = Country)) +
geom_line() +
scale_color_viridis_d()+
labs(title = "Country Crude Mortality Rate (per 1000 Persons), 1960 to 2019",
caption = "Limited to reporting countries",
y = "Crude Mortality Rate (per 1000 Persons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank())
direct.label(Mortality, method = list("angled.boxes")) With the exception of Losotho, we can see that the mortality rate appears to be steadily increasing in these countries.
What countries have the lowest reported rates? We can simply alter our plot code to not use the descinding value to rank mortality.
Mortality <-data_long %>%
filter(Indicator == "Deaths") %>%
group_by(Country) %>%
mutate(last_val = last(Value)) %>%
ungroup() %>%
mutate(rank = dense_rank(last_val)) %>%
filter(rank <= 5) %>%
ggplot(aes(x = Year, y = Value, color = Country)) +
geom_line() +
scale_color_viridis_d()+
labs(title = "Country Crude Mortality Rate (per 1000 Persons), 1960 to 2019",
caption = "Limited to reporting countries",
y = "Crude Mortality Rate (per 1000 Persons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank())
direct.label(Mortality, method = list("angled.boxes")) +
scale_x_continuous(expand = c(0.3,0)) Looks like many countries in the Persian Gulf region have the lowest rates of mortality.
Let’s make a plot of some of the countries that showed unusal patterns over time. We will include the US for comparison.
data_long %>%
filter(Indicator == "Deaths",
Country %in% c("United States",
"Rwanda",
"Cambodia",
"Qatar",
"Bulgria"))%>%
ggplot(aes(x = Year, y = Value, color = Country)) +
geom_line() +
labs(title = " Crude Mortality Rate (per 1000 Persons), 1960 to 2019",
caption = "Limited to reporting countries",
y = "Crude Mortality Rate (per 1000 Persons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16))We can make a similar plot where each country is shown side by side by using the facet_grid() function of the ggplot2 package. We need to indicate what variable we would like to use to group the data by and indicate it with this symbol ~. If we include a period (to indicate all other variables) we can change the orientation of the plots:
.~variable plots the subplots horizontally (plot the facet varible like the x axis)variable ~. plots the subplots vertically (plot the facet variable like the y axis)data_long %>%
filter(Indicator == "Deaths",
Country %in% c("United States",
"Rwanda",
"Cambodia",
"Qatar",
"Bulgaria"))%>%
ggplot(aes(x = Year, y = Value)) +
geom_line() +
facet_grid(.~ Country)+
labs(title = " Crude Mortality Rate (per 1000 Persons), 1960 to 2019",
caption = "Limited to reporting countries",
y = "Crude Mortality Rate (per 1000 Persons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16))We can see that recently the US has actually had higher mortality rates than Rwanda and Cambodia.
Let’s see how the US ranks.
data_long %>%
filter(Indicator == "Deaths") %>%
filter(Year == 2014) %>%
group_by(Country) %>%
mutate(last_val = last(Value)) %>%
ungroup() %>%
mutate(rank = dense_rank(desc(last_val)))%>%
filter(Country == "United States")# A tibble: 1 x 8
Country Year Label Indicator Value Region last_val rank
<fct> <dbl> <chr> <chr> <dbl> <chr> <dbl> <int>
1 United Stat… 2014 Deaths/1000 Peo… Deaths 8.24 United Sta… 8.24 68
The US ranked 68 out of 212 countries in 2014. This means that roughly 70% of the countries included had lower mortality rates than the US. See here and here for more information about mortality rates in the US.
Question Opportunity How would you determine the total number of countries reporting in 2014? ####
Let’s see what the overall trend in mortality has been over time.
data_long %>%
filter(Indicator == "Deaths") %>%
group_by(Year) %>%
summarise(Value = mean(Value, na.rm = TRUE)) %>%
ggplot(aes(x = Year, y = Value)) +
geom_line(size = 1.4) +
labs(title = "Mean Country Crude Mortality Rate (per 1000 Persons), 1960 to 2018",
caption = "Limited to reporting countries",
y = "Crude Mortality Rate (per 1000 Persons)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16)) We can see that despite some countries with different patterns of mortality, overall the world has exprienced much lower rates of mortality than in previous decades.
Now let’s take a look at the disaster data for the US.
data_long %>%
filter(Indicator == "Disasters") %>%
ggplot(aes(x = Year, y = Value, group = Country)) +
geom_line() +
labs(title = "US Disasters, 1980 to 2019",
subtitle = "Drougths, Floods, Freezes, Severe Storms. Tropical Cyclones, Wildfires, and Winter Storms",
y = "Disaster Count") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.title = element_text(size = 16))In general, it appears that temperatures have increased over time. We can add another ggplot2 layer by using the geom_smooth() to add a trend line. There are several methods to do this. We will use the loess method which stands for: Locally Weighted Smoothing. This method fits a trend to the data but does not assume that the trend will fit a particular shape.
data_long %>%
filter(Indicator == "Disasters") %>%
ggplot(aes(x = Year, y = Value, group = Country)) +
geom_line() +
geom_smooth(method = "loess") +
labs(title = "US Disasters, 1980 to 2019",
subtitle = "Drougths, Floods, Freezes, Severe Storms. Tropical Cyclones, Wildfires, and Winter Storms",
y = "Disaster Count") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.title = element_text(size = 16))We can see that the rate of disasters appears to be increasing over time. The added geom_smooth layer has added a blue trend line with the gray indicating the confidence interval of the trend line.
But what is a confidence interval?
A confidence interval gives an estimated range of values which is likely to include the true values of the entire population (for example all women in the US) if we are using just a small sample (for example 200 women in the US that take a survey) of the entire popultation. See here for more explanation about samples and populations.
Thus the gray area shows other possible trend lines that may fit the data of the acutal population.
####source
We could think of our data as the entire population. We have the actual counts for the number of disasters (based on specific definitions of disasters) that occured in the US each year from 1980 to 2010. Therefore, we would not need to calculate confidence intervals so we can remove them from our plot by using the se = FALSE argument of the geom_smooth() function of ggplot2. Confidence intervals are plotted by default as data is more often from a small sample of true populations and we try to generalize our trends based our sample to the true population.
data_long %>%
filter(Indicator == "Disasters") %>%
ggplot(aes(x = Year, y = Value, group = Country)) +
geom_line() +
geom_smooth(method = "loess", se = FALSE) +
labs(title = "US Disasters, 1980 to 2019",
subtitle = "Drougths, Floods, Freezes, Severe Storms. Tropical Cyclones, Wildfires, and Winter Storms",
y = "Disaster Count") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.title = element_text(size = 16))How about national average temperatures in the US?
data_long %>%
filter(Indicator == "Temperature") %>%
ggplot(aes(x = Year, y = Value, group = Country)) +
geom_line() +
geom_smooth(method = "loess", se = FALSE) +
labs(title = "US Average Annual Temperature, 1895 to 2019",
y = "Temperature (Fahrenheit)") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.title = element_text(size = 16)) Temperatures also appear to be rising, especially in the last 10-20 years.
Now Let’s try putting some of the different data types together together. We have previously used facet_grid() to plot multiple subplots simultaneously. Now we will use the facet_wrap() function of the ggplot2 package, which also plots multiple subplots simultaneously, however it also allows for different scales for the y-axis of the subplots. This is preferable in this case becuase it is difficult to see the data if all the subplots were plotted with the same y-axis scale, as you can see here:
ggplot(data_long, aes(x = Year, y = Value, group = Country)) +
geom_line(alpha = 0.2) +
geom_line(data = data_long %>%
filter(Country == "United States"),
aes(x = Year, y = Value, color = Country)) +
scale_colour_manual(values = c("red")) +
facet_grid(~Indicator,
scales = "free_y")+
#ncol = 1) +
labs(title = "Distribution of Indicators by Year and Value",
y = "Indicator Value") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
strip.text = element_text(size = 16, face = "bold"))To use facet_wrap() with the option for a different y-axis scale for each subplot, we need to set the scales argment equal to "free_y". We can also indicate where we would like the label for the subplots to be located by using the strip.position argument. Notice that we can change the size or style of the font for these labels using the strip.text = argument of the theme() function. We can also specify how many rows or columns we would like the subplots to be shown.
ggplot(data_long, aes(x = Year, y = Value, group = Country)) +
geom_line(alpha = 0.2) +
geom_line(data = data_long %>%
filter(Country == "United States"),
aes(x = Year, y = Value, color = Country)) +
scale_colour_manual(values = c("red")) +
facet_wrap(Indicator~.,
scales = "free_y",
strip.position = "right",
ncol = 1) +
labs(title = "Distribution of Indicators by Year and Value",
y = "Indicator Value") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
strip.text = element_text(size = 16, face = "bold"))Here we will facet by two variables: Indicator and Region (which is if the data is from the US or other countries). First we will fiter out the data about disasters and temperature as this is only for the US, by using the filter() function and != which indicates “not equal to”. In this case we want to use facet_grid() instead of facet_wrap() so that the same y-axis will be used across the rows.
data_long %>%
filter(Indicator != "Disasters" ,
Indicator != "Temperature") %>%
ggplot( aes(x = Year, y = Value, group = Country)) +
geom_line() +
facet_grid(Indicator ~ Region,
scales = "free_y")+
labs(title = "Distribution of Indicators by Year and Value",
y = "Indicator Value") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
strip.text = element_text(size = 16, face = "bold"))We can also use scales = "free" with facet_wrap() to have a different axis for each plot.
data_long %>%
filter(Indicator != "Disasters" ,
Indicator != "Temperature") %>%
ggplot( aes(x = Year, y = Value, group = Country)) +
geom_line() +
facet_wrap(Indicator ~ Region,
scales = "free",
ncol = 2)+
labs(title = "Distribution of Indicators by Year and Value",
y = "Indicator Value") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
strip.text = element_text(size = 16, face = "bold"))From these plots we can see that each type of data spans a different time span.
Let’s take a look at the reporting countries for each year for each type of data for the global indicators. To calculate the number of reporting countries we will use the tally() function of the dplyr package to get the count for each year and indicator. Thus we will first group by year and the Label variable (as a proxy for the Indicator variable), as this also specifies the different indicators but with additional information and formatting so that we can have nice labels in our plot.
We can add also vertical lines to our plot using the geom_vline() function of the ggplot2 package.
data_long %>%
filter(Indicator != "Disasters" &
Indicator != "Temperature") %>%
group_by(Year, Label, .drop=FALSE) %>%
tally() %>%
ggplot(aes(x = Year, y = n, color = Label)) +
geom_line() +
geom_vline(xintercept = 1980, linetype = 2, color = "black") +
geom_vline(xintercept = 2014, linetype = 2, color = "black") +
labs(title = "Countries with Complete Data per Year",
subtitle = "Global Data",
y = "Countries") +
scale_x_continuous(breaks = seq(1750,2020,by=10),
labels = seq(1750,2020,by=10)) +
theme(axis.text.x = element_text(angle = 90),
axis.title.x = element_blank(),
legend.position = "bottom") +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12, angle = 90),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16),
legend.title = element_blank())We can see that all of our data spans from 1980 to 2014.
What about the US data? We can summarize the data based on the minimum and maximum Year value using the summarize() function. Recall that this is a function of the dplyr package.
data_long %>%
filter(Country == "United States") %>%
group_by(Label) %>%
summarize(First_year= min(Year), Last_year = max(Year))# A tibble: 6 x 3
Label First_year Last_year
<chr> <dbl> <dbl>
1 CO2 Emissions (Metric Tons) 1800 2014
2 Deaths/1000 People 1960 2017
3 Energy Use (kg, oil-eq./capita) 1960 2015
4 GDP Growth/Capita (%) 1801 2019
5 Number of Disasters 1980 2019
6 Temperature (Fahrenheit) 1895 2019
Now we will plot a segment line for the span using geom_segment() and points for the first years and last years using geom_point(). This time we will add horizontal lines using geom_hline() to show where the time spans overlap.
data_long %>%
filter(Country == "United States") %>%
group_by(Label) %>%
summarize(First_year = min(Year), Last_year = max(Year)) %>%
ggplot(aes(y = Label, x = Last_year)) +
geom_segment(aes(y = Label,
yend = Label,
xend = Last_year,
x = First_year )) +
geom_point(aes(y = Label, x = First_year)) +
geom_point(aes(y = Label, x = Last_year)) +
geom_vline(xintercept = 1980, linetype=2) +
geom_vline(xintercept = 2014, linetype=2) +
labs(title = "Complete Data per Year",
subtitle = "US-specific Data",
y = "Countries") +
scale_x_continuous(breaks = seq(1750,2020,by=10),
labels = seq(1750,2020,by=10)) +
theme_linedraw() +
theme(axis.text.x = element_text(size = 12,angle = 90),
axis.text.y = element_text(size = 12),
axis.title = element_blank(),
plot.caption = element_text(size = 12),
plot.title = element_text(size = 16)) It looks like the overlapping time spans for the different datasets is from 1980 to 2014.
animation_2 <- data_long %>%
filter(Indicator=="Energy") %>%
ggplot(aes(x = Year,
y = Value,
group = Country,
color = Region,
size = Region,
alpha = Region)) +
geom_point() +
scale_color_manual(values = c("Red","Black")) +
scale_alpha_manual(values = c(0.5, 1)) +
scale_size_manual(values = c(0.25, 2)) +
labs(title="Distribution of Indicators by Year and Value, 1980-2010",
y = "Energy Use per Capita") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
transition_time(as.integer(Year)) +
shadow_wake(wake_length = 1, alpha = FALSE)
animate(animation_2, fps = 10, duration = 5)animation_3 <- data_long %>%
#filter(Type=="Global") %>%
filter(Indicator=="GDP") %>%
#filter(Year>=1980) %>%
#filter(Year<=2010) %>%
ggplot(aes(x = Year,
y = Value,
group = Country,
color = Region,
size = Region,
alpha = Region)) +
geom_point() +
scale_color_manual(values = c("Red","Black")) +
scale_alpha_manual(values = c(0.1, 1)) +
scale_size_manual(values = c(0.25, 2)) +
labs(title="Distribution of Indicators by Year and Value, 1980-2010",
y= "GDP Growth per Capita (%)") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
transition_time(as.integer(Year)) +
shadow_wake(wake_length = 1, alpha = FALSE)
animate(animation_3, fps = 10, duration = 5)animation_4 <- data_long %>%
#filter(Type=="Global") %>%
filter(Indicator=="Emissions") %>%
#filter(Year>=1980) %>%
#filter(Year<=2010) %>%
ggplot(aes(x = Year,
y = Value,
group = Country,
color = Region,
size = Region,
alpha = Region)) +
geom_point() +
scale_color_manual(values = c("Red","Black")) +
scale_alpha_manual(values = c(0.1, 1)) +
scale_size_manual(values = c(0.25, 2)) +
labs(title = "Distribution of Indicators by Year and Value, 1980-2010",
y = "CO2 Emissions (Mg)") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90)) +
transition_time(as.integer(Year)) +
shadow_wake(wake_length = 1, alpha = FALSE)
animate(animation_4, fps = 10, duration = 5)Let’s plot all of the US indicators together.
US_Indicators <- data_long %>%
filter(Country=="United States")%>%
filter(Year>=1980) %>%
ggplot(aes(x=Year, y=Value)) +
geom_line() +
geom_smooth(method = "loess", se = FALSE) +
facet_wrap(Label~., ncol=2, nrow=3, scales = "free_y") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 90, size = 12),
axis.text.y = element_text(size = 12),
strip.text.x = element_text(face = "bold", size = 12),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 12)) +
labs(title = "US-specific Indicators")
US_IndicatorsWe can also plot the data with points for the indivual values rather than a line using geom_point().
US_Indicators_point <- data_long %>%
filter(Country=="United States")%>%
filter(Year>=1980) %>%
ggplot(aes(x=Year, y=Value)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
facet_wrap(Label~., ncol=2, nrow=3, scales = "free_y") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 90, size = 12),
axis.text.y = element_text(size = 12),
strip.text.x = element_text(face = "bold", size = 12),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 12)) +
labs(title = "US-specific Indicators")
US_Indicators_pointIf we want to really look at how two indicators may relate to eachother, it is imporant that both datasets span the same amount of time. Therefore, we will limit this plot to only the years where the data overlaps for both CO2 emissions and temperature.
data_long %>%
filter(Country == "United States") %>%
filter(Year>=1980) %>%
filter(Year<=2010) %>%
filter(Indicator == "Emissions"|
Indicator == "Temperature") %>%
ggplot(aes(x=Year, y=Value)) +
geom_line() +
geom_smooth(method = "loess", se = FALSE) +
scale_x_continuous(breaks = seq(1980,2010,by=5),
labels = seq(1980,2010,by=5)) +
facet_wrap(Indicator~., scales = "free_y", ncol=1) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank()) +
labs(title="US Emissions and Temperatures (1980-2010)")We can see that there are very similar patterns of CO2 emission levels an average annual temperatures. We will analyze this further in a bit. There are also some other common visualization techniques that utilize the mean of a set of values over a time span to show how values are changing over time in a different way.
To create such a visualization, we will first calculate the mean of our Emission and temperature values from 1980 to 2010 and create a new variable called Mean. Then we will calculate the difference of each value from the mean and create a new variable for these values called Diff_from_mean. Finally we will also create a factor variable about the sign of the Diff_from_mean value to distinguish positive or negative changes. We will use this to color our plots.
data_long_us <- data_long %>%
filter(Country == "United States") %>%
filter(Year >= 1980,
Year <= 2010) %>%
group_by(Indicator) %>%
mutate(Mean = mean(Value),
Diff_from_mean= Value-Mean) %>%
ungroup() %>%
mutate(Diff_color = sign(Diff_from_mean)) %>%
mutate(Diff_color = as.factor(Diff_color))Rows: 186
Columns: 9
$ Country <fct> United States, United States, United States, United St…
$ Year <dbl> 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, …
$ Label <chr> "CO2 Emissions (Metric Tons)", "CO2 Emissions (Metric …
$ Indicator <chr> "Emissions", "Emissions", "Emissions", "Emissions", "E…
$ Value <dbl> 4720000, 4540000, 4310000, 4340000, 4480000, 4490000, …
$ Region <chr> "United States", "United States", "United States", "Un…
$ Mean <dbl> 5134194, 5134194, 5134194, 5134194, 5134194, 5134194, …
$ Diff_from_mean <dbl> -414193.548, -594193.548, -824193.548, -794193.548, -6…
$ Diff_color <fct> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1…
Ok, now we will create a plot that shows a bar for the change in each indicator from the mean value across the timespan for each year.
data_long_us %>%
filter(Indicator == "Emissions"|
Indicator == "Temperature"|
Indicator == "Disasters") %>%
ggplot(aes(x=Year, y=Value)) +
geom_segment(aes(x=Year, y=Value, xend=Year, yend=Mean, color=Diff_color), size=3.25) +
scale_color_manual(values = c("blue","red")) +
geom_hline(aes(yintercept=Mean), linetype=1, color="black") +
facet_wrap(Indicator~., scales = "free_y", ncol=1) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank(),
legend.position = "none") +
labs(title = "US Disaters, Emissions, and Temperatures (1990-2010)",
subtitle = "Indicator Mean of 1990-2010 Represented by Solid Black Line") We can see from this plot that overall there has been an increase in Disasters, Emissions and Temperature in the most recent years.
What if we look at mortality, gdp, and disasters in the same way?
data_long_us %>%
filter(Indicator=="Deaths"|
Indicator=="GDP"|
Indicator =="Disasters") %>%
ggplot(aes(x=Year, y=Value)) +
geom_segment(aes(x=Year, y=Value, xend=Year, yend=Mean, color=Diff_color), size=3.25) +
scale_color_manual(values = c("blue","red")) +
geom_hline(aes(yintercept=Mean), linetype=1, color="black") +
facet_wrap(Indicator~., scales = "free_y", ncol=1) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank(),
legend.position = "none") +
labs(title = "US Emissions and Temperatures (1990-2010)",
subtitle = "Indicator Mean of 1990-2010 Represented by Solid Black Line") Luckily at this point depsite increased disaster rates, mortality appears to be decreasing. The GDP has been relatively consistent.
Now how about energy use and emissions?
data_long_us %>%
filter(Indicator=="Emissions"|
Indicator == "Energy") %>%
ggplot(aes(x=Year, y=Value)) +
geom_segment(aes(x=Year, y=Value, xend=Year, yend=Mean, color=Diff_color), size=3.25) +
scale_color_manual(values = c("blue","red")) +
geom_hline(aes(yintercept=Mean), linetype=1, color="black") +
facet_wrap(Indicator~., scales = "free_y", ncol=1) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank(),
legend.position = "none") +
labs(title = "US Emissions and Energy Use (1990-2010)",
subtitle = "Indicator Mean of 1990-2010 Represented by Solid Black Line")Looks like in very recent years, there has been a decline in both energy use and CO2 emissions.
We see that CO2 emission levels, annual average national temperatures, and disasters counts appear to be increasing over time, but how can we assess this statistically?
We can use correlation. Correlation is a measure of strength the of a relationship between two variables. Often when we describe correlation we are referring to linear correlation and therefore the linear relationship between variables.
If we plot one variable on the x-axis and the other variable on the y-axis, we can see:
If the variables point upward in a very clear line, then there is a strong positive relationship. If the points do not really form a line, then there is a weak linear realtionship or no linear relationship - there may however be a nonlinear relationship if the points create a different but defined shape. See here for more information. If the points form a downward sloping line, then there is a negative relationship.
The numbers bellow each plot above are called correlation coefficients. They range from -1 to 1. A value of zero indicates that there is no correlation between the variables. While a value of 1 or -1 indicates perfect correlation, the closer the coefficient is to 1 or -1, the stronger the relationship. The sign of the coeffficient indicates the direction of the relationship. If there is a negative relationship then the variables show opposing changes from each other - as one gets larger the other gets smaller. If the sign is positive, then the variables increase similarly.
Let’s take a look at our data now. We have already plotted our US indicators across time, but we did not look at the linear trend. We will do that now using geom_point() to plot the individual data points and this time we will use the "lm" method which stands for linear method for our geom_smooth() layer.
US_Indicators_lm <- data_long %>%
filter(Country=="United States")%>%
filter(Year>=1980) %>%
ggplot(aes(x=Year, y=Value)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(Label~., ncol=2, nrow=3, scales = "free_y") +
theme_linedraw() +
theme(axis.text.x = element_text(angle = 90, size = 12),
axis.text.y = element_text(size = 12),
strip.text.x = element_text(face = "bold", size = 12),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 12)) +
labs(title = "US-specific Indicators")
US_Indicators_lmFirst let’s create a wide tibble for our US data so that we can similarly plot our data. To do this we will create a wide tibble using the pivot_wider() function of the dplyr package. This function requires values for two arguments, names_from and values_from. The variable that has the indentity or labels for the values that you wish to create multiple new variables names from is used for the names_from argument. The variable that contains the corresponding values for the new variables is used for the values_from argument.
wide_US <-data_long %>%
filter(Country == "United States") %>%
filter(Year>=1980) %>%
filter(Year<=2010) %>%
select(-Label) %>%
pivot_wider(names_from = Indicator, values_from = Value)Rows: 31
Columns: 9
$ Country <fct> United States, United States, United States, United State…
$ Year <dbl> 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 198…
$ Region <chr> "United States", "United States", "United States", "Unite…
$ Emissions <dbl> 4720000, 4540000, 4310000, 4340000, 4480000, 4490000, 450…
$ GDP <dbl> -1.1300, 1.5000, -2.8200, 3.2500, 6.3600, 2.9500, 2.5000,…
$ Energy <dbl> 7940, 7650, 7260, 7200, 7440, 7460, 7380, 7620, 7850, 789…
$ Deaths <dbl> 8.8, 8.6, 8.5, 8.6, 8.7, 8.7, 8.7, 8.6, 8.9, 8.8, 8.6, 8.…
$ Disasters <dbl> 3, 2, 3, 5, 2, 5, 2, 0, 1, 5, 3, 4, 7, 5, 6, 5, 4, 3, 10,…
$ Temperature <dbl> 52.39, 53.12, 51.34, 51.88, 51.97, 51.30, 53.32, 53.33, 5…
We can use the cor.test() of the stats package to calculate pearson’s correlation estimates, as well as confidence intervals for correlation estimates. This function allows for a few different methods to calculate correlation estimates. The default is the Pearson’s product-moment method. All three methods result in a correlation coefficient that ranges from -1 to 1 that indicate the strength of the association or relationship between the variables. However, each method has a slightly different calculation.
See here and here for more information.
In the case of our data, individual points would be the mesurements for each variable at each year.
[1] 0.4453282
[1] 0.9099066
[1] 0.5803805
We can see that in all cases, there appears to be a positive linear relationship between the tested US indicators and time. We see the relationship is positive becuase the correlation estimates are positive (as well as the fact that our plots show upward linear relationships).
We can assess the strength of the relationship based on this table which provides general guidelines:
We also see that the correlation of temperature with time is low, it is very strong for the emissions data, and moderate for the disaster data.
We can also use a linear regression to evaluate the relationship between our variables. This helps us to answer slightly deeper questions like:
Do changes in time predict or explain changes in temperature, emission, or disaster values?
We can get the results from a linear model by using the base summary() function and the lm() function of the stats package.
The variable on the left of the ~ indicates what we are trying to predict also known as the dependent variable, while the variable(s) on the right are independent.
Call:
lm(formula = Temperature ~ Year, data = wide_US)
Residuals:
Min 1Q Median 3Q Max
-1.4681 -0.5662 0.1051 0.5750 1.2987
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -28.26165 30.26858 -0.934 0.3582
Year 0.04064 0.01517 2.678 0.0121 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7556 on 29 degrees of freedom
Multiple R-squared: 0.1983, Adjusted R-squared: 0.1707
F-statistic: 7.174 on 1 and 29 DF, p-value: 0.01206
Call:
lm(formula = Emissions ~ Year, data = wide_US)
Residuals:
Min 1Q Median 3Q Max
-544444 -125131 39181 151431 316431
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -90376432 8085611 -11.18 4.98e-12 ***
Year 47875 4053 11.81 1.33e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 201800 on 29 degrees of freedom
Multiple R-squared: 0.8279, Adjusted R-squared: 0.822
F-statistic: 139.5 on 1 and 29 DF, p-value: 1.327e-12
Call:
lm(formula = Disasters ~ Year, data = wide_US)
Residuals:
Min 1Q Median 3Q Max
-3.6331 -1.1774 -0.2702 0.8800 5.2520
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -313.0746 82.7922 -3.781 0.000722 ***
Year 0.1593 0.0415 3.838 0.000620 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.067 on 29 degrees of freedom
Multiple R-squared: 0.3368, Adjusted R-squared: 0.314
F-statistic: 14.73 on 1 and 29 DF, p-value: 0.00062
In all cases, yes it appears that all of these indicators show a positive association (due to positive t value) with time - meaning they show an increase over time.
Notice that the Multiple R-squared value in the output is equal to the correlation coefficeint squared!
Since CO2 emission levels an average annual temperatures appear to have similar patterns over time, we might want to analyze if these values are correlated with one another.
We might also ask:
Are emission levels associated with temperature levels or disaster levels? Can we predict or explain temperature or disaster levels based on emission levels?
Let’s first plot both on the same plot, where emissions will be plotted on one axis and temperatures on the other.
wide_US %>%
ggplot(aes(x=Emissions, y=Temperature)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank()) +
labs(title="US Emissions and Temperature (1980-2010)")wide_US %>%
ggplot(aes(x=Temperature, y=Disasters)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank()) +
labs(title="US Disasters and Temperatures (1980-2010)")wide_US %>%
ggplot(aes(x=Emissions, y=Disasters)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90),
axis.title = element_blank()) +
labs(title="US Emissions and Disasters (1980-2010)")Now we can caluculate Pearson correlation coefficients.
[1] 0.5429183
[1] 0.09489648
[1] 0.4686469
And finally peform linear regreession analysis:
Call:
lm(formula = Temperature ~ Emissions, data = wide_US)
Residuals:
Min 1Q Median 3Q Max
-1.4512 -0.4182 -0.0627 0.5443 1.1704
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.798e+01 1.394e+00 34.405 <2e-16 ***
Emissions 9.416e-07 2.705e-07 3.481 0.0016 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7087 on 29 degrees of freedom
Multiple R-squared: 0.2948, Adjusted R-squared: 0.2704
F-statistic: 12.12 on 1 and 29 DF, p-value: 0.0016
Call:
lm(formula = Disasters ~ Temperature, data = wide_US)
Residuals:
Min 1Q Median 3Q Max
-4.8260 -1.5306 0.0827 1.1797 7.4708
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -10.3939 29.3623 -0.354 0.726
Temperature 0.2854 0.5559 0.513 0.612
Residual standard error: 2.526 on 29 degrees of freedom
Multiple R-squared: 0.009005, Adjusted R-squared: -0.02517
F-statistic: 0.2635 on 1 and 29 DF, p-value: 0.6116
Call:
lm(formula = Disasters ~ Emissions, data = wide_US)
Residuals:
Min 1Q Median 3Q Max
-3.8160 -1.2161 -0.2805 0.9640 6.1595
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.872e+00 4.411e+00 -1.785 0.08479 .
Emissions 2.444e-06 8.556e-07 2.857 0.00783 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.242 on 29 degrees of freedom
Multiple R-squared: 0.2196, Adjusted R-squared: 0.1927
F-statistic: 8.162 on 1 and 29 DF, p-value: 0.007833
Both temperature and the number of disasters per year appear to increase with increased CO2 emissions, as the p value is less than 0.05 in both regressions. However, the multiple R-squared value, and pearson correlation coefficients were moderate for both relationships. Suggesting that the relationship is not very strong and that there are likely other factors that influence temperature and disasters besides C02 emissions. This makes sense with what we know about the earth’s atmosphere. There are other greenhouse gases that contribute to warming temperatures, and the existing CO2 already in the atmosphere also traps heat and greatly impacts the temperature. Furthermore we are looking at US emissions and how they influence US temperatures, but there are also CO2 emissions produced by other countries. There are other aspects that influence disaster rate as well, such the rate of humidity levels and rainfall for fires.
While the analyses that we performed give us some indication of how these different datasets relate to one another, one would realisticly want to perform a mixed effects model or growth curve analysis to account for the fact that these data are paired across timepoints and may vary differently with time and to include these other factors that we just discussed. … AVOCADO needs links etc. feel free to help this section.
Now we will create a plot that summarizes our major findings. We will use the plot_layout() function of the patchwork package.
(CO2_world | Top10)/ US_Indicators +
plot_layout(widths = c(1, 2), heights = unit(c(4, 5), c('cm', 'null')))png(here::here("img", "mainplot.png"), width = 900, height = 700)
(CO2_world | Top10)/ US_Indicators +
plot_layout(widths = c(1, 2), heights = unit(c(4, 5), c('cm', 'null')))
dev.off()quartz_off_screen
2
Ask students to create a plot with labels showing the countries with the lowest CO2 emission levels.
Ask students to plot energy use and emissions on a scatter plot, calculate the pearson correlation coeffecient, and discuss what the results mean.
Even though there is quite a bit of scientific evidence to indicate that in fact CO2 emissions trap heat and lead to increased global temperatures, it is important to realize that there are other factors involved in the relationship between US CO2 emissions and US annual average temperatures. However, it is vital that we work around the globe to reduce greenhouse gas emissions to meditigate the increased temperatures that we will experience due to the existing CO2 already in the atmosphere so that the warming temperatures aren’t as extreme as they could be. Furthermore, we need to prepare for increased rates of natural disasters and how these may influence people around the world. Evidence suggests that impoverished people are the most affected by diasters. We need to be particularly mindeful of this as we prepare.
confidence intevals linear reg. https://rstudio-pubs-static.s3.amazonaws.com/195401_20b3272a8bb04615ae7ee4c81d18ffb5.html